Site reliability engineer

Melbourne

Persistent Systems

Posted: 8 May

Offer description

Key Responsibilities
* Improve reliability, availability, and recoverability of Fin crime, Transaction Monitoring platforms.
* Define and manage SLIs/SLOs, alerting, and observability to proactively manage service health.
* Provide L1/L2 production support for AWS DHP and Oracle‐based environments.
* Build and maintain monitoring, logging, alerting, and automation to reduce operational toil.
* Support incident response, root cause analysis, and post‐incident reviews.
* Implement and operate CI/CD pipelines and infrastructure automation.
* Collaborate with engineering and platform teams to design reliable, scalable systems.
* Participate in on‐call rotations as per operational requirements.
Skills & Experience (Essential)
* Experience operating production‐grade systems with strong reliability and availability requirements.
* Strong knowledge of SRE and observability practices (monitoring, logging, alerting, SLOs).
* Hands‐on experience with AWS(EC2, S3, RDS, VPC, IAM) and Linux environments.
* Working knowledge of Oracle platforms/databases in enterprise environments.
* Experience with CI/CD tools (e.g. Jenkins, GitLab CI, AWS Code Pipeline).
* Working knowledge of Python and Shell scripting for automation.
* Familiarity with observability tools such as Grafana, Prometheus, ELK/EFK, CloudWatch, PagerDuty.
* Experience with OFSAA / Oracle Rules Engine platforms.
* Oracle performance tuning, RAC, or RMAN exposure.
* Experience supporting ML‐enabled platforms or model execution pipelines (e.g. TRACE).

Persistent is an Equal Opportunity Employer and prohibits discrimination and harassment of any kind.

#J-18808-Ljbffr

Send an application

Create a job alert

Save