OverviewPosition Summary:As a Site Reliability Engineer Lead, you will lead a team of SRE engineers and manage the challenges of scaling our client's digitization program.
Your expertise in coding, algorithms, complexity analysis, and large-scale system design will be crucial in building scalable, reliable, durable, and secure applications for our customers and internal users.
You will develop highly reliable applications with a customer-first approach while innovating technically and understanding our customers' needs.Mandatory SkillsStrong experience in Java, Spring Boot, Node.js, microservices, RDBMS, NoSQLProficiency with AWS services such as EC2, S3, Lambda, IAM, ECS, EKS, SQS, KinesisObservability using Splunk, NewRelicInfrastructure as Code using TerraformAPIs and event-driven approachesSecurity patternsUnix/Linux systems administration, with familiarity in DockerStrong experience in analyzing and troubleshooting large-scale distributed systemsAbility to debug and optimize code and automate routine tasksFamiliarity with containerization and orchestration technologies such as Docker and KubernetesKnowledge of contemporary software engineering practices and tools - Agile and DevOpsStrong communication skills and the ability to explain complex technical matters in an easy-to-understand wayStrong domain knowledge of telecom billing and charging rating systemsDuties and ResponsibilitiesWithin the Site Reliability Engineering team, collaborate with various development teams and other partner teams to ensure applications' reliability, efficiency, and performance meet customer needs, while keeping the service operational, scalable, and automated.Develop tools and automation to streamline operations and improve system reliability, efficiency, and performance.Partner with development teams on feature launches to ensure reliable and scalable functionality for customers.Build deep knowledge of production infrastructure to debug distributed systems problems and identify system improvements.Operations, SLO, SLA managementMetrics reporting and progress trackingManage infrastructure costs and optimize resource utilizationWork with security teams to ensure compliance with security policies and proceduresParticipate in on-call rotations to provide 24/7 support for our systemsObservability (alarms, monitoring, synthetics)Error managementQualifications & Certifications (Optional)· Bachelor's degree in computer science or a related engineering degree20+ years of IT industry experienceSalary Range>100,000Date of Posting25 September 2025Next StepsIf you feel this opportunity suits you, or Cognizant is the type of organization you would like to join, we want to have a conversation with you.
Please apply directly with us.For a complete list of open opportunities with Cognizant, visit.
Cognizant is committed to providing Equal Employment Opportunities.
Successful candidates will be required to undergo a background check.
#LI-CTSAPAC
#J-18808-Ljbffr