Job Description: We are looking for a skilled Site Reliability Engineer to join our team. This role is crucial in ensuring the high availability and performance of our customer-facing systems and products.
Responsibilities:
* Support highly available and high-demand customer-facing systems and products.
* Work in a 24/7 operations environment supporting mission-critical systems for external customers.
* Previous experience as a Site Reliability Engineer or in an environment that has adopted SRE principles and practices.
* Experience with cloud-based infrastructure, such as AWS or Azure, and container technologies like Docker and Kubernetes.
* Automation and configuration management tools such as Ansible, Terraform, or Chef.
* Strong programming skills in one or more scripting languages like Python, Bash, or PowerShell.
* Demonstrated experience in Java development and building Ops monitoring tools.
* Designing, debugging, and running fault-tolerant large-scale distributed systems.
* Experience with software repositories like Git and Bitbucket.
* Relational databases such as MySQL, Oracle, Postgres.
* Linux system administration, HAProxy, Apache, WebLogic, Tomcat, and JBoss.
Good to Have:
* Experience working in an IT operations team of a large complex organization.
* Strong communication and collaboration skills, with the ability to work effectively across teams.
* Knowledge of networking and network security, including DNS, TCP/IP, firewalls, and load balancers.
* Excellent analytical and problem-solving skills demonstrated in incident management and troubleshooting production issues.
* Experience in BSS (Billing and Customer Care Platforms) and understanding of the fulfillment environment.
* Amdocs and CES product suite experience will be highly regarded.
* Knowledge and understanding of various security protocols.
* SSL handshakes and certificate setup for two-way SSL.
* Experience with monitoring tools like ELK, Splunk, Dynatrace, Nagios.
* Involvement in build and release management processes and CI/CD tools.
* Demonstrated experience with automation tools like Puppet Enterprise, Docker, and Ansible.
* A good knowledge of encryption technologies like SSL, firewalls, and networking in a large complex enterprise environment.