We are seeking a highly skilled professional to lead our team in driving enduring reliability across all products and services.
About the Role
This role involves driving robust, consistent, and fast responses to high severity incidents. The successful candidate will be responsible for leading technical discussions to identify and track actions associated with and identified during incident situations.
As a Lead Engineer, you will drive best practice across the business and contribute to the ongoing transformation of our organization's culture.
Key Responsibilities:
* Own the incident management process to ensure it drives enduring reliability across all products and services.
* Provide expert leadership during critical outages, coordinating multiple teams to ensure streamlined decision-making and quick resolution.
* Lead and advocate for the transformation to a world-leading SRE organization, promoting SRE principles within the Engineering Department.
* Promote a customer-focused approach by addressing and mitigating global customer environment issues, and fostering a culture of continuous learning and technical excellence within the team.
Required Skills and Qualifications
The ideal candidate will have previous experience as a Site Reliability Engineer, in an Operations or Engineering environment. Strong hands-on coding experience (preferably Python) and knowledge of software engineering best practices are also required. Additionally, the candidate should have hands-on experience troubleshooting AWS hosted services, networking knowledge, and be able to troubleshoot TCP/IP, SSL/TLS, DNSSEC, IPsec, and BGP issues.
Why Us?
We offer a range of benefits including generous paid leave, health insurance, life insurance, and income protection. Our wellbeing and sports programs, employee resource groups, 26 weeks of paid parental leave for primary caregivers, an Employee Share Plan, beautiful offices, flexible working, career development, and many other benefits reflect our human values.