Our team is looking for a highly skilled Senior Service Reliability Engineer to join our cloud gaming service.
This critical role involves designing and implementing scalable systems to ensure the stability of our service, focusing on three main areas: production ownership, code quality, and deployments.
The ideal candidate will be self-directed, able to participate in decision-making processes at different levels, and have opinions on the state of our service.
They will provide critical feedback during different phases of the operational lifecycle, ensuring operational readiness and stability.
Key Responsibilities
* Lead technical discussions around ongoing improvements in reliability and scalability.
* Contribute to high-level designs for new products and platforms.
* Mentor junior staff to enable their success.
* Lead incident response and post-mortem activities within your assigned service team.
* Work with other engineers to prioritize reliability improvements and address technical debt.
* Develop automation scripts to reduce ongoing toil.
Requirements
* Minimum 5+ years working experience in Software Development and/or Linux Systems Administration role.
* Strong interpersonal, written and verbal communication skills.
* Available to be scheduled in on-call rotation.
Skills & Knowledge
* Proficient as a Linux Production Systems Engineer, with experience managing large-scale Web Services infrastructure.
* Development experience in one or more of the following programming languages: Python, Bash, Go, Java, C++, or Rust.
* In addition, experience with at least 3 of the following topics: Distributed data storage at scale (Hadoop, Ceph), Data Aggregation technologies (ElasticSearch, Kafka), Scaling and running traditional RDBMS (PostgreSQL, MySQL) with High Availability, Monitoring & Alerting (Prometheus, Grafana), Incident Management toolsets, Kubernetes and/or AWS (deployment and management), Software Distribution (Package management and distribution at scale), Configuration Management (ansible, saltstack, puppet, chef), S/W Performance analysis and load testing (QA or SDET experience).
Why Work With Us?
* We are an equal opportunity employer and welcome applications from diverse candidates.
* We offer a dynamic work environment and opportunities for growth and development.
* We value our employees' well-being and offer a range of benefits to support their physical and mental health.