Job Overview
As a seasoned expert in site reliability engineering, you will play a pivotal role in transforming our compute infrastructure to a cutting-edge containerization model. Your expertise will be instrumental in automating infrastructure, enhancing system reliability, and implementing best practices for container orchestration.
About the Role
* You will automate infrastructure to significantly improve system efficiency and reliability.
* Implement best practices for container orchestration using Kubernetes.
* Collaborate with cross-functional teams to elevate compute capabilities.
* Drive the adoption of modern, scalable solutions.
* Evaluate and ensure robust monitoring, incident response, and continuous improvement processes are in place.
Key Qualifications
* Strong background in Unix.
* Proficiency in scripting languages such as Python, Bash, and Ansible.
* Experience in supporting and managing Kubernetes clusters.
* Experience in managing declarative GitOps CD using ArgoCD.
* Experience with observability and logging tools such as Grafana, AppDynamics, Splunk, and CloudWatch.
* Experience with Windows Server is advantageous.
About Technology at Our Organization
We leverage technology to drive every aspect of our business. We're a global team that accelerates digital transformation, connects people and data, builds platforms and applications, and designs innovative technology solutions.
Our Commitment to Diversity, Equity, and Inclusion
We foster a diverse, equitable, and inclusive workplace. We encourage individuals from all backgrounds to apply and welcome diverse perspectives. Our aim is to provide reasonable adjustments to support individuals who may need assistance during the recruitment process and through working arrangements.