Introduction
At IBM Infrastructure & Technology, we design and operate the systems that keep the world running. From high-resiliency mainframes and hybrid cloud platforms to networking, automation, and site reliability. Our teams ensure the performance, security, and scalability that clients and industries depend on every day. Working in Infrastructure & Technology means tackling complex challenges with curiosity and collaboration. You'll work with diverse technologies and colleagues worldwide to deliver resilient, future-ready solutions that power innovation. With continuous learning, career growth, and a supportive culture, IBM provides the opportunities to build expertise and shape the infrastructure that drives progress.
Your Role And Responsibilities
Deep understanding of IBM Spectrum Scale (GPFS) architecture
NSD servers, quorum nodes, manager roles
Metadata vs data disks
Failure groups & declustering
ESS node roles:
I/O Server (IOS)
Management node
Client nodes (HPC / AI workloads)
ESS building blocks
Building block layout (BB, racks, drawers)
Disk enclosures, NVMe / SSD / HDD tiers
ESS high availability design
Node failover
Disk and enclosure redundancy
Network path redundancy
Spectrum Scale (GPFS) – Advanced Administration
Install, configure, and upgrade Spectrum Scale on ESS
File system lifecycle management:
Create, modify, extend, and delete GPFS file systems
Fileset design (dependent/independent)
Snapshot management
Advanced GPFS tuning:
Pagepool tuning
Token management
NSD performance optimization
Quorum & cluster health management
CES (Cluster Export Services):
NFS, SMB, Object (if applicable)
Authentication (AD / LDAP)
ILM (Information Lifecycle Management):
Tiering policies
Placement rules & Policy execution and troubleshooting
Storage Hardware Expertise (ESS-Specific)
Disk and enclosure management:
Drive replacement procedures
Drawer and enclosure diagnostics
RAID and declustered array behavior
Firmware management:
Disk, enclosure, HBA, NIC firmware
Non-disruptive firmware updates
Hardware error interpretation:
ESS event logs
Call Home data
Predictive failure analysis
Understanding of ESS Call Home & SSR workflow
Performance Engineering & Troubleshooting
Networking (Critical Skill Area)
ESS networking design:
Data vs management networks
High-speed interconnects (25/40/100 GbE)
VLAN, bonding, LACP configuration
RDMA / RoCE concepts (for performance-critical deployments)
Network troubleshooting impacting storage I/O
Linux System Administration (Advanced)
RHEL administration (ESS-certified versions)
Kernel parameter tuning
Systemd services for ESS & GPFS
Disk, multipathing, and device management
Log analysis and root cause isolation
Secure access, sudo, SSH hardening
Availability, DR & Maintenance
Planned maintenance with zero or minimal downtime
Rolling upgrades (Spectrum Scale & firmware)
Backup and restore strategies:
GPFS snapshots
Integration with backup tools
Disaster recovery concepts:
Multi-cluster ESS
AFM (Active File Management)
Replication and failover
Operational Excellence & IBM Processes
Deep familiarity with:
IBM ESS maintenance workflows
Firmware compatibility matrices
IBM support engagement (PMRs)
Root cause analysis documentation
Change management and risk assessment
Client advisory and best-practice guidance
Preferred Education
Master's Degree
Role Summary
Required technical and professional expertise
The IBM Elastic Storage System (ESS) System Administrator is responsible for the design, administration, performance, availability, and lifecycle management of IBM ESS environments powered by IBM Spectrum Scale (GPFS).
This role requires deep technical expertise across storage hardware, Spectrum Scale software, Linux, and high-performance networking to support mission-critical HPC, AI, and enterprise workloads.
Key Responsibilities
ESS & Spectrum Scale Administration
Storage Hardware & Firmware Management
Performance Engineering & Troubleshooting
Networking & Infrastructure
Linux System Administration
Security, DR & Compliance
Automation & Monitoring
Operational Leadership
Preferred Technical And Professional Experience
Required Technical Skills
Expert knowledge of IBM ESS architecture
Advanced administration of IBM Spectrum Scale (GPFS)
Strong Linux (RHEL) system administration skills
Deep understanding of storage hardware and firmware lifecycle
Performance tuning and troubleshooting expertise
Enterprise networking knowledge (Ethernet, RDMA, bonding)