Job Overview
We are seeking an experienced Staff Site Reliability Engineer to lead the scaling of our infrastructure, observability, and performance across the business. This high-trust, high-impact role requires a technical expert who can architect resilient systems, mentor engineers, and drive org-wide reliability strategy.
Key Responsibilities
* Own and evolve our SRE architecture and roadmap from CI/CD to incident response
* Design scalable, fault-tolerant systems across cloud infrastructure (GCP/AWS)
* Define, measure, and enforce SLOs, SLAs, and error budgets org-wide
* Partner with engineering and product leads to embed reliability across teams
* Lead major technical initiatives, incident reviews, and tooling investments
* Mentor engineers, contribute to internal education, and elevate on-call culture
Requirements
* 7+ years in SRE, infrastructure, or DevOps roles in high-scale environments
* Deep systems knowledge cloud-native infra, distributed systems, and automation
* Experience leading technical initiatives and cross-functional projects
* Expertise in observability stacks, CI/CD tooling, and deployment strategies
* Strong communicator who thrives on teaching and building alignment
* Bias toward action, clarity, and long-term thinking
Benefits
We offer a competitive salary and growth path, generous paid time off, inclusive parental leave policies, free food, and equity & ownership opportunities. Our team is passionate about shaping the future of human connection, and we're committed to supporting your personal and professional growth.