Job Description
">
Site reliability engineers combine software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. This role ensures that services have reliability, uptime appropriate to customer's needs and a fast rate of improvement.
Much of the development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. As a site reliability engineer, you'll have the opportunity to manage complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design.
Responsibilities
* Write product or system development code
* Review code developed by other engineers and provide feedback to ensure best practices
* Contribute to existing documentation or educational content and adapt content based on product/program updates and user feedback
* Triage product or system issues and debug/track/resolve by analyzing the sources of issues and the impact on hardware, network, or service operations and quality
Required Skills and Qualifications
* Bachelor's degree in Computer Science, a related field, or equivalent practical experience
* 2 years of experience with software development in one or more programming languages
* 2 years of experience with data structures or algorithms
Benefits
Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.
Google is proud to be an equal opportunity workplace and is an affirmative action employer. We welcome Indigenous applicants and commit to building reconciliation through our technology, platforms and people.
Other Information
This role combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. It involves ensuring that services have reliability, uptime appropriate to customer's needs and a fast rate of improvement.