Senior Site Reliability Engineer (SRE) – Remote
Join the Food Tech Revolution at EatClub!
About Us
EatClub is a quick‑growing tech company with big global ambitions, co‑founded by legendary chef Marco Pierre White and industry leaders. We’re on a mission to revolutionise the hospitality industry, helping restaurants boost profitability through smart, dynamic pricing. We power thousands of venues across Australia and have recently launched in the UK as we expand our international footprint. Our platform connects over 2 million customers with thousands of top restaurants, offering real‑time deals that can save diners up to 50% off their bill.
Read about us in the AFR: Read more in Broadsheet:
Why You’ll Love Working Here
- Everyone contributes: We encourage every team member to bring their ideas to the table and help shape what we build next.
- Create big impact: Your code will directly affect how thousands of venues and millions of users connect over food and drink.
- Startup speed, real ownership: You’ll work in a fast‑paced, agile environment where your ideas matter and shipping code is the focus.
- Remote + flexible: We care about outcomes, not clock‑watching. Work from wherever suits you best.
- Inclusive + diverse: We embrace differences and believe the best teams are built on diversity in background, thinking, and experience.
- Surround yourself with exceptional talent: We seek out top talent. Our people are passionate about their craft, and love inspiring those around them to be their best.
- Supportive, fun‑loving team: We work hard together, cheer each other on, and celebrate our wins as a team.
The Role
We’re looking for a proactive, product‑focused Senior SRE Engineer who’s excited to take on multiple responsibilities and build and support the mission‑critical systems that bring joy to users and value to restaurants. You’ll join a team where you can have real influence, wide scope, and room to grow in a fun and rewarding environment.
Key Responsibilities
- Reliability Engineering – Support capacity planning, define and manage service‑level objectives (SLOs) and error budgets, and lead incident response efforts.
- Platform Support – Responsibility for the availability and performance of the EatClub AWS platform. Build automation to prevent recurrence of issues and ensure all non‑exceptional service conditions are handled automatically.
- Cross‑functional Collaboration – Partner with software engineers, product managers, and other stakeholders to embed reliability, scalability, and resilience into the software delivery lifecycle.
- Incident Management – Coordinate and communicate during critical production events, supporting response efforts and ensuring rapid resolution.
- CI/CD and Testing Support – Work closely with engineering teams to support and improve CI/CD pipelines and automated testing frameworks.
- Disaster Readiness – Build systems that are designed and tested for fault tolerance, redundancy, and recovery.
- Observability – Build monitoring, logging, and alerting systems based on best practices to improve visibility into system health and performance.
- Automation Leadership – Promote and implement automation across operational processes to reduce toil and increase efficiency.
- Security Posture – Contribute to improving the security of infrastructure and processes through proactive hardening and secure practices.
What You’ll Bring
- Proactive, improvement‑focused mindset with a passion for building reliable systems.
- A can‑do mindset, ready to roll up your sleeves.
- Proven experience in DevOps, Site Reliability Engineering, platform operations, or a similar discipline.
- AWS Cloud infrastructure expertise – Experienced in building infrastructure in AWS, including services like EC2, S3, IAM, CloudWatch, etc., ideally across multiple geographies.
- Infrastructure as Code (IaC) – Strong proficiency with tools like Terraform or AWS CDK.
- CI/CD pipelines – Building and maintaining robust deployment pipelines using GitHub Actions.
- Observability – Experience designing and managing logging, monitoring, and alerting stacks (e.g., Prometheus, Grafana, Datadog, ELK, OpenTelemetry).
- Scripting & Automation – Proficient in Python or Bash for automating operational tasks and building internal tools.
- Security & compliance awareness – Familiarity with secure infrastructure practices, secrets management, vulnerability scanning, and audits; understanding of compliance standards (SOC2, ISO 27001, etc.).
- Skilled in IP networking concepts (DNS, load balancing, etc.).
- Experience with Linux systems administration – Experience with Linux, system internals, and performance tuning.
Bonus Points If You…
- Have experience in Database Administration.
- Are experienced in Containerisation & Orchestration – Expertise in Docker and Kubernetes (EKS/GKE/AKS), including deployment, monitoring, and troubleshooting.
- Have experience with Chaos engineering / fault injection – Experience building resilient systems and running game days or incident simulations.
Qualifications
- Degree in Computer Science or a related discipline.
- A minimum of 5 years of post‑degree commercial experience in DevOps and AWS, in high‑scale, high‑availability environments.
- Full working rights in Australia.
Hungry Yet?
If you're looking for a role where you can do your best work, make a visible impact, grow your career, and work with great humans, then we'd love to hear from you. Apply now, and let’s build something extraordinary – one dish, one booking, one feature at a time.
Job Details
- Seniority level: Mid‑Senior level
- Employment type: Full‑time
- Job function: Engineering and Information Technology
- Industries: Hospitality
#J-18808-Ljbffr