Posted: 28 August
Offer description
Highly skilled Service Reliability Engineers are sought after to join a dynamic team in Adelaide.
The ideal candidate will play a pivotal role in delivering a top-notch cloud gaming experience to customers worldwide.
As a key member of the team, you will influence design and operational decisions towards the overall stability of the gaming service. Your focus will be on three main areas: production ownership, code quality, and deployments.
We expect our SREs to have opinions on the state of our service and provide critical feedback during different phases of the operational lifecycle. You will be engaged throughout the software development lifecycle, ensuring the operational readiness and stability of our services.
* Lead technical discussions around ongoing improvements in reliability and scalability.
* Create high-level designs for new products and platforms.
* Mentor junior SRE staff and enable them for success.
* Lead incident response and post-mortem activities within your assigned service team.
* Collaborate with other engineers in a cross-functional team to prioritize reliability improvements to address technical debt and toil.
* Contribute to code to improve reliability.
* Implement automation to reduce ongoing toil.
* Minimum 5+ years of experience in software development and/or Linux systems administration.
* Strong interpersonal, written, and verbal communication skills.
* Availability to participate in an on-call rotation.
* Proficient as a Linux production systems engineer with experience managing large-scale web services infrastructure.
* Development experience in one or more programming languages: Python, Bash, Go, Java, C++, or Rust.
* Experience with at least 3 of the following topics: distributed data storage at scale, NoSQL at scale, data aggregation technologies, scaling and running traditional RDBMS with high availability, monitoring & alerting, incident management toolsets, Kubernetes and/or AWS deployment and management, software distribution, configuration management, software performance analysis, and load testing.