Job Summary:
Site Reliability Engineer
A Site Reliability Engineer ensures the reliability, scalability and performance of systems and services by applying software engineering principles to infrastructure and operations problems.
* System Reliability & Performance – Design, build, and maintain scalable and highly available systems.
* Monitor system health and performance using observability tools.
* Incident Management – Respond to production incidents, perform root cause analysis, and implement preventive measures.
* Automation – Develop scripts and tools to automate repetitive tasks and improve efficiency.
* Capacity Planning – Forecast system demands and plan for scaling infrastructure.
* Collaboration – Work closely with development teams to ensure reliability is built into applications.
* Security & Compliance – Implement best practices for system security and compliance.