Linux Systems / Site Reliability Engineer (1-2 Years' Experience)
A unique opportunity to join a rapidly growing, world-class organization at NetSuite Global Business Unit (NSGBU) Oracle - Site Reliability Engineering team (SRE). This team plays a key role in product availability, stability, performance, and security. You will collaborate with various engineering teams (system engineering, network, infrastructure, security, maintenance) to support, design, implement, and develop tooling and automation platforms. The NSGBU SRE team also supports application operations in Oracle Cloud Infrastructure, a fast-growing cloud service.
What Will You Do :
As a member of our world-class Site Reliability team, you will leverage your expertise in monitoring, backups, infrastructure, and systems architecture, focusing on mean time to resolution for issues impacting our customer-facing production environment.
The primary responsibility of the SRE is to ensure Oracle NetSuite NSGBU Cloud Operations systems are operational, minimizing customer impact by identifying, resolving, or escalating issues efficiently.
1. Resolve site incidents daily across hardware, network, OS, and application levels.
2. Work proficiently in the Linux terminal.
3. Utilize monitoring and analytics tools like Kibana and Icinga to resolve incidents and identify problems.
4. Collaborate with multiple teams to build systems and services that enhance operational efficiency, reliability, scalability, resilience, security, and performance of Oracle NSGBU products.
5. Participate in NSGBU SRE 24x7 'Follow the Sun' operational coverage.
Qualifications and Education Requirements :
* Strong knowledge and experience in Linux system internals, monitoring, networking, and core cloud concepts.
* Understanding of standard internet services such as DNS, TCP/IP, NFS, and global load balancing.
* Experience in performance troubleshooting and tuning.
* Familiarity with web technologies including Apache, and web sessions.
* Understanding of the software development lifecycle.
* Experience with database environments requiring high availability.
* Solid analytical troubleshooting skills.
* Knowledge of architectural patterns for distributed systems and cloud computing at scale.
* Excellent communication skills in English.
* Bachelor's degree in Computer Science or a related field.
Preferred Skills
* Ability to work quickly and accurately under pressure in time-sensitive situations.
* Self-motivated with a pride in ownership and innovation to improve efficiency and effectiveness.
* Valuing simplicity and scalability, comfortable in collaborative, agile environments, and eager to learn.
* Minimum 5 years of experience in large-scale production operations providing mission-critical services.
Additional Qualifications and Skills
* Scripting experience in Bash, Perl, Python, or similar.
* Knowledge of databases like Oracle, Cassandra, Redis.
* Experience with network monitoring, protocols, SNMP, syslog, telemetry, REST API.
* Experience with orchestration and configuration tools such as SaltStack, Terraform, Kubernetes, Ansible.
* Exposure to distributed platforms like GlusterFS, Zookeeper, Kafka, Elasticsearch.
* Exposure to Oracle Exadata.
* Basic understanding of ITIL concepts.
Additional Notes
This role requires the ability to work on weekends.
#J-18808-Ljbffr