Cloud Infrastructure Leader
">
We are seeking an experienced Cloud Infrastructure Leader to join our team. As a key member of our infrastructure management group, you will be responsible for overseeing the day-to-day health, stability, and performance of our Azure-based infrastructure platform.
">
">
* You will lead and mentor a geographically distributed team of platform engineers responsible for supporting both production and non-production workloads hosted in Microsoft Azure.
">
* You will establish technical direction, enforce engineering best practices, and provide hands-on guidance to resolve day-to-day operational challenges.
">
* You will act as the highest-level technical escalation point (L3) for complex infrastructure-related incidents, including degraded performance, VM unavailability, OS-level faults, and network reachability issues.
">
* You will review, validate, and approve proposed infrastructure changes, automation scripts, and configuration updates submitted by the team.
">
* You will champion platform-wide consistency by driving adherence to Microsoft's Cloud Adoption Framework (CAF) and client-specific governance models.
">
* You will collaborate with architects and security teams to assess and implement hardening requirements, architectural improvements, and modernization initiatives, while balancing risk and operational stability.
">
* You will continuously assess team performance and technical maturity; identify skill gaps and align upskilling activities with emerging Azure features, operational tools, and evolving client needs.
">
">
Key Responsibilities:
">
">
* Ensure the platform team's daily tasks are executed, including VM patching (RHEL and Windows), backup validation, performance monitoring, and lifecycle operations.
">
* Coordinate and oversee the team's integration with monitoring tools such as Azure Monitor and Dynatrace for observability and proactive issue detection.
">
* Monitor patch compliance, backup status, and performance metrics, and drive remediation activities as needed.
">
* Perform operational support for production and non-production servers hosted on Microsoft Azure.
">
* Manage and maintain system availability, performance, and patch compliance using Azure-native tools and automation.
">
* Ensure adherence to tagging policies, RBAC controls, NSG configurations, and HA/DR practices as per Microsoft Cloud Adoption Framework (CAF).
">
* Troubleshoot and resolve infrastructure-related incidents (L2/L3), including those related to VM performance, boot failures, and network reachability.
">
* Collaborate with the client's service desk and change management functions to fulfill service requests and planned changes.
">
* Integrate with the existing monitoring framework (Azure Monitor + Dynatrace) to ensure platform observability and proactive incident response.
">
* Participate in the review and implementation of hardening recommendations, within the boundaries of agreed service scope.
">
* Support cloud-native backup and recovery validation activities; raise concerns if protection gaps are identified.
">
* Contribute to infrastructure documentation and knowledge base articles to support operational continuity and knowledge transfer.
">
">
Requirements:
">
">
* 2+ years of hands-on experience managing infrastructure in a Microsoft Azure/RedHat environment.
">
* Proficiency in supporting both RHEL and Windows Server-based virtual machines.
">
* Strong understanding of Azure IaaS services, including virtual machines, storage accounts, load balancers, NSGs, and availability sets/zones.
">
* Experience with infrastructure patching using native Azure tools (e.g., Update Management, Azure Automation).
">
* Familiarity with monitoring and observability tools, particularly Azure Monitor and Dynatrace.
">
* Solid troubleshooting skills across compute, networking, and OS-level issues.
">
* Understanding of ITIL processes (incident, change, problem, configuration management).
">
">
Desirable Qualifications:
">
">
* Microsoft Certified: Azure Administrator Associate, Azure DevOps Engineer or equivalent certification.
">
* Red Hat Certified: RHCSA or equivalent
">
* Experience working in government or highly regulated environments.
">
* Exposure to infrastructure-as-code (IaC) using Terraform or ARM templates.
">
">
About Us:
">
We are a leading provider of hybrid IT, private cloud, and connectivity services, backed by decades of industry experience. Our expertise combined with the strength of our people makes us one of the most complete and prominent