Production Support Lead - Data and Analytics
A senior role in our team that requires a production support expert to lead the development of runbooks and documentation, identify monitoring gaps, and design automation solutions for improved system availability and efficiency.
This position will be full-time flexible and can be based in any of our offices. We are looking for someone who demonstrates passion for a career in data analytics and has knowledge on establishing and optimizing processes for production support. Key responsibilities include improving operational efficiencies, leading a skilled onshore support team, defining support strategies for systems in the Cloud, and collaborating with other tech leads and support teams.
The ideal candidate will have a background in consulting, experience working with application architectures, networking, DevOps, and Site Reliability Engineering concepts, as well as familiarity with monitoring solutions like Splunk. They will also possess strong problem-solving skills, ability to automate routine maintenance tasks, and knowledge of networking principles.
We believe that whoever you are, your uniqueness helps us stand apart. We bring together extraordinary people like you to build a better working world. If you're passionate about data and analytics, we want to hear from you.
About this role
* Job Type: Full-Time
* Location: Canberra/Australian Capital Territory or Sydney/Melbourne/Brisbane offices
* Company: EY
Key Skills and Qualifications:
* Knowledge on establishing and optimizing processes for production support (Level 3).
* Improve operational efficiencies to prevent and mitigate production incidents.
* Lead the team in creating runbooks and support documentation.
* Form a skilled onshore support team focused on system availability, performance, and capacity and collaborate with other tech leads and support teams for end-to-end availability and reliability.
* Define support strategies for systems in the Cloud (AWS) and influence resiliency and scalability in Amazon Web Services production environments.
* Identify and address monitoring and alerting gaps and design/deliver technical and process automation for improved system availability and efficiency.
* Solve critical service issues and develop automated responses to non-exceptional conditions.
* Contribute to service capacity planning, software performance analysis, and system tuning.
* Utilize monitoring solutions (e.g., Splunk) to reduce outage detection time.
* Familiarity with application architectures, networking, DevOps, and Site Reliability Engineering concepts.
* Automate routine maintenance tasks and address common issues and identify and mitigate risks to critical and non-critical system KPIs.
* Demonstrate knowledge of networking principles, including protocols, MAC addresses, IP packets, DNS, OSI layers, and load balancing.
What We Can Offer You
* Flexible work arrangements
* Opportunities for professional growth and development
* A dynamic and inclusive work environment