Job Title
A site reliability engineer lead oversees a team of engineers, ensuring the scalability, reliability, and security of our digitization program.
Mandatory Skills
* Java Spring Boot Node.js microservices RDBMS NoSQL
* AWS services EC2 S3 Lambda IAM ECS EKS SQS Kinesis Splunk NewRelic Terraform APIs event-driven approaches Security patterns Unix/Linux Docker
* Strong experience in analyzing troubleshooting large-scale distributed systems Ability to debug optimize code automate routine tasks Containerization orchestration technologies Docker Kubernetes Modern software engineering practices Agile DevOps Strong communication skills domain knowledge telecom billing charging rating systems
Duties Responsibilities
* Collaborate with development teams partner teams to ensure applications reliability efficiency performance meet customer needs.
* Develop automation streamline operations improve system reliability efficiency performance.
* Partner with development teams feature launches ensure reliable scalable functionality customers.
* Build deep knowledge production infrastructure debug distributed systems problems identify system improvements Operations SLO SLA management Metrics reporting progress tracking Infrastructure costs resource utilization Management Work security teams compliance security policies procedures Participate on-call rotations 24/7 support systems Observability alarms monitoring synthetics Error management
Qualifications Certifications
Bachelors degree computer science related engineering degree 20+ years IT industry experience