Senior Site Reliability Engineers play a vital role in ensuring the smooth operation of all user-facing services and production systems at an organization. This position requires a unique blend of technical skills, with expertise in systems, networking, Linux kernel, scaling, algorithms, and distributed systems.
Key Responsibilities:
Automate
: Build automation tools to streamline operational tasks, such as package updates, configuration changes, and provisioning of customer-facing services.
Maintain
: Develop reliable maintenance systems for library upgrades, version migrations, and other tasks.
Plan
: Design monitoring and alerting systems that predict capacity needs based on customer usage patterns.
Respond
: Respond promptly to user emergencies, platform alerts, and support requests.
Enhance
: Implement new and update existing security measures to protect infrastructure.
Partner
: Collaborate with compliance assessors to ensure regulatory requirements are met.
Collaborate
: Work with software development teams to shape the future roadmap and establish operational readiness.
Suitable candidates should have experience with Infrastructure as a Code technologies, Terraform, and GoLang or Ruby. Strong problem-solving skills, collaboration abilities, and a focus on delivering innovative solutions are essential.