Site Reliability Engineer
At Globant, we are working to make the world a better place, one step at a time. We enhance business development and enterprise solutions to prepare them for a digital future. With a diverse and talented team present in more than 30 countries, we are strategic partners to leading global companies in their business process transformation.
We seek a Principal Site Reliability Engineer who shares our passion for innovation and change. This role is critical to helping our business partners evolve and adapt to consumers' personalized expectations in this new technological era.
Responsibilities
* Design and implementation of software systems, preferably running at scale.
* Experience leading a team preferable
* Proven track record in managing complex, large-scale distributed systems across multiple layers: infrastructure, networks, applications, and data platforms.
* Experience working in cloud-native environments (AWS, Azure, GCP) and hybrid cloud/on-prem setups.
* Deep exposure to high-availability architectures, disaster recovery, and failover strategies.
* Hands-on experience with CI/CD pipelines, Infrastructure as Code (IaC) tools, and automation frameworks.
* Background in monitoring, observability, and performance optimization, using tools like Prometheus, Grafana, Datadog, New Relic, or Splunk.
* Experience working with networking protocols ( TCP/IP, DNS, BGP, VPN) and strong understanding of application layer performance and security.
* Exposure to AI/ML projects or MLOps pipelines is a plus (especially if aligned with data engineering or AI model reliability).
* Prior experience in incident response, postmortem analysis, and continuous improvement cycles.
* Familiarity with Agile, DevOps, and SRE principles, including error budgets and blameless postmortems.
Key Responsibilities
* Take ownership of the full stack reliability — from infrastructure, networks, databases, middleware, to application performance.
* Proactively identify, troubleshoot, and resolve complex system issues across multiple environments.
* Design, implement, and maintain highly reliable and scalable architectures to meet critical business SLOs.
* Develop and maintain robust monitoring and alerting systems to ensure fast detection and resolution of incidents.
* Lead root cause analyses and post-incident reviews to improve system resilience.
* Collaborate closely with development, QA, infrastructure, and product teams to embed reliability into the software lifecycle.
* Automate repetitive tasks and operational workflows to improve system efficiency and team productivity.
* Mentor and share knowledge with team members to foster a culture of continuous learning and curiosity.
* Keep up to date with emerging technologies, propose improvements, and evaluate new tools or approaches that can enhance reliability, scalability, or performance.
Required Skills and Qualifications
* Strong systems engineering background with a deep understanding of Linux/Unix internals and/or Windows systems.
* Advanced knowledge of networking concepts: routing, load balancing, firewall management, DNS, VPN, SSL/TLS.
* Proficiency in at least one programming/scripting language: Python, Go, Java, Bash, or similar.
* Solid experience with cloud platforms (AWS, GCP, Azure), containers (Docker), and orchestration (Kubernetes).
* Hands-on skills with monitoring/observability tools (Prometheus, Grafana, Datadog, New Relic, Splunk).
* Familiarity with databases and storage systems (SQL, NoSQL, distributed storage).
* Experience with IaC tools
* Excellent problem-solving skills with a curious, investigative mindset — ability to dig deep into unknown issues across layers.
* Strong communication and collaboration skills, able to work effectively across cross-functional teams.
* Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
Nice-to-Have Skills
* Security and compliance awareness: Understanding of cloud security best practices, penetration testing, vulnerability management, and regulatory frameworks (e.g., PCI DSS, ISO 27001).
* Performance tuning expertise: Ability to fine-tune application and database performance under heavy load, including JVM tuning, query optimization, and caching strategies.
* Serverless architectures: Familiarity with serverless frameworks (AWS Lambda, Google Cloud Functions, Azure Functions) and event-driven design patterns.
* Multi-cloud or hybrid cloud experience: Working across multiple cloud providers or integrating cloud with on-premises environments.
* Certifications: AWS/GCP/Azure certifications, Kubernetes certification (CKA/CKAD), or SRE-focused credentials.
Preferred Soft Skills
* Excellent communication skills, an expert at partnering with stakeholders
* Highly curious, meticulous, and independent
* Must have a solid and diverse engineering background
Seniority level
* Seniority level
Mid-Senior level
Employment type
* Employment type
Full-time
Job function
* Job function
Engineering and Information Technology
* Industries
IT Services and IT Consulting
Referrals increase your chances of interviewing at Globant by 2x
Get notified about new Site Reliability Engineer jobs in Sydney, New South Wales, Australia .
Site Reliability Engineer, Google Cloud Storage
Villawood, New South Wales, Australia 2 weeks ago
Senior Site Reliability Engineer / Global Tech Unicorn - $200k + super + $40k Equity
Sydney, New South Wales, Australia A$200,000.00-A$220,000.00 2 weeks ago
Sydney, New South Wales, Australia 1 month ago
Sydney, New South Wales, Australia 1 day ago
Sydney, New South Wales, Australia 5 days ago
Sydney, New South Wales, Australia 1 month ago
Sydney, New South Wales, Australia 4 days ago
Sydney, New South Wales, Australia A$75,000.00-A$100,000.00 2 days ago
Graduate Site Reliability Engineer - Technical Infrastructure - 2026 Start
Sydney, New South Wales, Australia 4 days ago
Sydney, New South Wales, Australia 2 months ago
Sydney, New South Wales, Australia 2 months ago
Sydney, New South Wales, Australia 3 weeks ago
Associate Linux Systems / Site Reliability Engineer (1-2 Years' Experience)
Sydney, New South Wales, Australia 2 weeks ago
Sydney, New South Wales, Australia 2 weeks ago
Software Engineer, Site Reliability Engineering, Campus
Sydney, New South Wales, Australia 5 days ago
Software Engineer, Early Career, NetSoft
Graduate Site Reliability Engineer - Technical Infrastructure - 2026 Start
Sydney, New South Wales, Australia 3 days ago
Millers Point, New South Wales, Australia 2 days ago
Sydney, New South Wales, Australia 1 month ago
Sydney, New South Wales, Australia 4 days ago
Sydney, New South Wales, Australia 2 weeks ago
Sydney, New South Wales, Australia 3 weeks ago
Sydney, New South Wales, Australia 1 month ago
Sydney, New South Wales, Australia 2 months ago
Sydney, New South Wales, Australia 3 days ago
Millers Point, New South Wales, Australia 1 week ago
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr