Cloud Reliability Expert
We're seeking an experienced Cloud Reliability Expert to lead the development of a cloud gaming service. The selected candidate will be responsible for ensuring the stability and scalability of our gaming platform.
In this role, you'll focus on influencing design and operational decisions, collaborating with cross-functional teams, and mentoring junior colleagues.
We value self-directed individuals who can participate in decision-making at different levels. Our ideal candidate has a strong background in software development and Linux systems administration, with experience managing large-scale web services infrastructure.
Key Responsibilities:
* Lead technical discussions around reliability and scalability improvements
* Create High-Level Designs (HLDs) for new products and platforms
* Mentor junior colleagues
* Lead incident response and post-mortem activities
* Collaborate with engineers to prioritize reliability improvements
Requirements:
* 5+ years of experience in software development and/or Linux systems administration
* Strong interpersonal, written, and verbal communication skills
* Able to work in an on-call rotation
Skills & Knowledge:
* Proficient as a Linux Production Systems Engineer
* Development experience in Python, Bash, Go, Java, C++, or Rust
* Experience with distributed data storage, NoSQL, data aggregation technologies, monitoring and alerting, incident management, Kubernetes, AWS, software distribution, configuration management, and software performance analysis
We strive to create an inclusive environment that empowers employees and embraces diversity. We welcome applicants from all backgrounds and encourage everyone to apply.