Job Title
Senior Site Reliability Engineers play a vital role in ensuring the smooth operation of all user-facing services and production systems at an organization. This position requires a unique blend of technical skills, with expertise in systems, networking, Linux kernel, scaling, algorithms, and distributed systems.
Key Responsibilities:
* Automate: Build automation tools to streamline operational tasks, such as package updates, configuration changes, and provisioning of customer-facing services.
* Maintain: Develop reliable maintenance systems for library upgrades, version migrations, and other tasks.
* Plan: Design monitoring and alerting systems that predict capacity needs based on customer usage patterns.
* Respond: Respond promptly to user emergencies, platform alerts, and support requests.
* Enhance: Implement new and update existing security measures to protect infrastructure.
* Partner: Collaborate with compliance assessors to ensure regulatory requirements are met.
* Collaborate: Work with software development teams to shape the future roadmap and establish operational readiness.
Requirements
Suitable candidates should have experience with Infrastructure as a Code technologies, Terraform, and GoLang or Ruby. Strong problem-solving skills, collaboration abilities, and a focus on delivering innovative solutions are essential.