We are seeking a highly skilled DevOps Engineer / Site Reliability Engineer (SRE) to design, implement, and maintain scalable automation tools and Infrastructure-as-Code solutions. The ideal candidate will have hands-on experience with cloud platforms, CI/CD pipelines, disaster recovery, and cross-functional collaboration. This role is critical in driving efficiency, ensuring system reliability, and enabling smooth software delivery across the organization.
Key Responsibilities
* Design, implement, and maintain Infrastructure as Code (IaC) solutions using Terraform, Ansible, and AWS CloudFormation to automate and streamline operational processes.
* Develop, optimize, and enhance CI/CD pipelines, reducing release times and improving software delivery reliability.
* Create and execute comprehensive disaster recovery and business continuity plans, including automated backup and failover strategies, to maintain 99.9% data availability.
* Manage and administer multi-cloud environments (AWS, Azure, GCP) to improve scalability, resilience, and flexibility of IT infrastructure.
* Collaborate with development, QA, and operations teams to coordinate project activities, improve project success rates, and streamline communication.
* Provide mentorship, training, and technical guidance to junior engineers to strengthen team expertise and knowledge-sharing practices.
Required Skills & Qualifications
* Proven experience in DevOps, SRE, or Cloud Infrastructure Engineering.
* Strong expertise in Terraform, Ansible, and AWS CloudFormation.
* Hands-on experience managing multi-cloud environments (AWS, Azure, GCP).
* Solid understanding of CI/CD pipelines and related tools (Jenkins, GitLab CI, GitHub Actions, etc.).
* Knowledge of disaster recovery strategies, high availability, and system resilience.
* Excellent problem-solving, troubleshooting, and communication skills.
* Ability to collaborate with cross-functional teams and mentor junior engineers.
Preferred Qualifications
* Experience with container orchestration (Kubernetes, Docker).
* Familiarity with monitoring and logging tools (Prometheus, Grafana, ELK stack, CloudWatch).
* Knowledge of security best practices in cloud environments.
* Certifications in AWS, Azure, or GCP are a plus.
Job Type: Full-time
Pay: $130,000.00 – $140,000.00 per year
Work Location: In person