Overview
Position Summary:
As a Site Reliability Engineer Lead, you will lead a team of SRE engineers and manage the challenges of scaling our client's digitization program. Your expertise in coding, algorithms, complexity analysis, and large-scale system design will be crucial in building scalable, reliable, durable, and secure applications for our customers and internal users. You will develop highly reliable applications with a customer-first approach while innovating technically and understanding our customers' needs.
Mandatory Skills
* Strong experience in Java, Spring Boot, Node.js, microservices, RDBMS, NoSQL
* Proficiency with AWS services such as EC2, S3, Lambda, IAM, ECS, EKS, SQS, Kinesis
* Observability using Splunk, NewRelic
* Infrastructure as Code using Terraform
* APIs and event-driven approaches
* Security patterns
* Unix/Linux systems administration, with familiarity in Docker
* Strong experience in analyzing and troubleshooting large-scale distributed systems
* Ability to debug and optimize code and automate routine tasks
* Familiarity with containerization and orchestration technologies such as Docker and Kubernetes
* Knowledge of modern software engineering practices and tools - Agile and DevOps
* Strong communication skills and the ability to explain complex technical matters in an easy-to-understand way
* Strong domain knowledge of telecom billing and charging rating systems
Duties and Responsibilities
* Within the Site Reliability Engineering team, collaborate with various development teams and other partner teams to ensure applications’ reliability, efficiency, and performance meet customer needs, while keeping the service operational, scalable, and automated.
* Develop tools and automation to streamline operations and improve system reliability, efficiency, and performance.
* Partner with development teams on feature launches to ensure reliable and scalable functionality for customers.
* Build deep knowledge of production infrastructure to debug distributed systems problems and identify system improvements.
* Operations, SLO, SLA management
* Metrics reporting and progress tracking
* Manage infrastructure costs and optimize resource utilization
* Work with security teams to ensure compliance with security policies and procedures
* Participate in on-call rotations to provide 24/7 support for our systems
* Observability (alarms, monitoring, synthetics)
* Error management
Qualifications & Certifications (Optional)
· Bachelor’s degree in computer science or a related engineering degree
20+ years of IT industry experience
Salary Range
>100,000
Date of Posting
25 September 2025
Next Steps
If you feel this opportunity suits you, or Cognizant is the type of organization you would like to join, we want to have a conversation with you. Please apply directly with us.
For a complete list of open opportunities with Cognizant, visit http://www.cognizant.com/careers. Cognizant is committed to providing Equal Employment Opportunities. Successful candidates will be required to undergo a background check.
#LI-CTSAPAC
#J-18808-Ljbffr