Job Title: Software Reliability Engineer
We are seeking a highly skilled Software Reliability Engineer to drive the development of our software products and ensure their operational performance and efficiency across code, infrastructure, and services.
Key Responsibilities:
* Monitor system performance, availability, and capacity across production environments.
* Design and implement automated monitoring, alerting, and incident response procedures.
* Lead post-incident reviews and drive implementation of preventive measures.
* Collaborate with engineering teams to improve system scalability and reliability.
* Implement infrastructure as code and automated deployment strategies.
* Establish SLIs, SLOs, and error budgets to balance feature velocity with system stability.
* Write, test, and debug high-quality code while leading feature design and implementation that aligns with business goals.
* Participate in code reviews, enforce coding conventions, and provide actionable feedback to improve codebase quality.
* Mentor less experienced engineers, manage project delivery independently, and foster a collaborative growth-oriented environment.
Requirements:
* Experience with cloud service providers (Azure preferred).
* Experience with containerization and orchestration using Docker and Kubernetes (or similar).
* Experience with monitoring and observability tools (Prometheus, Grafana, DataDog, or similar).
* Knowledge of infrastructure as code tools (Terraform, Ansible, or CloudFormation).
* Experience with service mesh technologies (Istio, Linkerd).
* Understanding of incident management processes and on-call responsibilities.
* Experience with performance optimization and capacity planning.
* Technical proficiency in backend programming languages such as TypeScript (Node.js), C#, Python, Ruby, or Java.
* Practical knowledge of basic security practices in web development.
* Experience with database systems such as MySQL, PostgreSQL, or MongoDB.
* Experience with observability principles in software engineering (metrics, logging, monitoring, tracing, alerting).
* Experience with version control systems like Git.
* Experience implementing RESTful APIs.
Preferred Skills:
* Experience with distributed systems, their design and application.
* Knowledge of chaos engineering principles and tools.
* Experience with load testing and performance benchmarking.
* Familiarity with disaster recovery and business continuity planning.
* Familiarity with microservices architecture.
* Familiarity with unit testing frameworks and test-driven development.
* Exposure to frontend development technologies including HTML, CSS, and JavaScript; React for web application development.
* Continuous integration/continuous deployment (CI/CD) pipelines experience.
* Agile development practices familiarity.
* Product-oriented environment experience.
Why Join Us?
Work with impact and purpose. We're helping industries thrive and you'll be at the forefront of this.
Work with great people. A supportive, diverse, and inclusive team with trust, freedom, and support to experiment and learn from failure.
Work that challenges you. We're growing quickly, and you will too, with opportunities to grow and learn as we expand and scale globally.
Work that works for you. We're a flexible, remote-friendly place with inclusive leave options and day-to-day work times to suit your routine.
],