Job Description:
We are seeking a skilled Senior Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, implementing, and maintaining the reliability and performance of our production infrastructure estate.
**Key Responsibilities:**
• Empower agile development teams with high-performance CI/CD pipelines, ensuring fast, high-quality releases with measurable performance and quality metrics.
• Design, maintain, and secure cloud infrastructure using Infrastructure-as-Code tools like Terraform and Crossplane.
• Automate operational tasks using Go, Python, and serverless solutions (AWS Lambda, Kubernetes Jobs).
• Manage and monitor Kubernetes clusters for multiple production workloads.
• Develop and maintain blockchain infrastructure, managing nodes across Ethereum, Solana, Arbitrum, Base, Avalanche, and others.
• Ensure system reliability and security by participating in on-call rotations, troubleshooting disruptions, conducting root cause analysis, and collaborating with Security teams on security-focused tools and frameworks.
• Plan, test, and implement disaster recovery strategies for a highly available microservices architecture.
• Leverage AI-powered solutions for managing infrastructure, analyzing logs, detecting anomalies, capacity planning, maintaining predictively, and optimizing performance.
**Requirements:**
• 4+ years in DevOps or SRE roles
• 3+ years in CI/CD platform development and microservices support
• Strong observability, problem-solving, and performance optimization skills in complex, distributed systems
• Hands-on experience with Blue-Green, Canary, and A/B Testing deployment strategies for services and databases