Cloud Gaming Revolution
We're at the forefront of a cloud gaming revolution, putting console-quality video games on any device from TVs to mobile devices and beyond.
Our team is leading the way in delivering an exceptional cloud gaming experience to our customers by influencing design and operational decisions towards overall stability.
As part of Sony Interactive Entertainment, we're striving to create an inclusive environment that empowers employees and embraces diversity.
1. Key Responsibilities:
* Lead technical discussions around ongoing improvements in reliability and scalability.
* Create High-Level Designs (HLDs) for new products and platforms.
* Mentor junior SRE staff and enable them for success.
* Lead incident response and post-mortem activities within your assigned service team.
* Work with other Engineers in a cross-functional team to prioritise reliability improvements to address technical debt and toil.
* Contribute to code to improve reliability and implement automation to reduce ongoing toil.
-----------------------------------
Requirements
To succeed in this role, you'll need:
2. Minimum of 5+ years working experience in Software Development and/or Linux Systems Administration.
* Strong interpersonal, written and verbal communication skills.
* Available to be scheduled in on-call rotation.
-----------------------------------
Skills & Knowledge
You'll bring:
3. Proficient as a Linux Production Systems Engineer, with experience managing large scale Web Services infrastructure.
4. Development experience in one or more of the following programming languages:
5. Python (preferred)
6. Bash, Go, Java, C++, or Rust
7. In addition, experience with at least 3 of the following topics:
8. Distributed data storage at scale (Hadoop, Ceph)
9. NoSQL at scale (MongoDB, Redis, Cassandra)
10. Data Aggregation technologies. (ElasticSearch, Kafka)
11. Scaling and running traditional RDBMS (PostgreSQL, MySQL) with High Availability
12. Monitoring & Alerting (Prometheus, Grafana), and Incident Management toolsets
13. Kubernetes and/or AWS (deployment and management)
14. Software Distribution (Package management and distribution at scale)
15. Configuration Management (ansible, saltstack, puppet, chef)
* S/W Performance analysis and load testing (QA or SDET experience: a plus)