Site Reliability Engineering Manager
We are seeking a skilled professional to enhance the reliability and performance of our Data & AI platforms by developing monitoring solutions, automating operational tasks, and optimizing platform efficiency.
1. Maintain system health through proactive monitoring and analysis
2. Streamline operational processes to increase platform productivity
This role offers flexible working options, team discounts, a supportive culture, and opportunities for professional growth. Ideal candidates possess experience in DevOps and cloud-native environments, with expertise in scripting languages and incident management strategies.
Key Responsibilities:
* Develop and implement monitoring tools to ensure system stability
* Automate routine tasks to optimize platform efficiency
Requirements:
* Proficient in scripting languages such as Python or Bash
* Experience with cloud-native environments including AWS or GCP
* Strong understanding of DevOps principles and practices
What We Offer:
* Flexible working options to suit your needs
* Team discounts on products and services
* A supportive culture that fosters growth and development