We are seeking a highly skilled System Monitoring Expert to join our team.
">
* Design and implement monitoring solutions using the Elastic Stack (ELK + Beats) across on-prem and cloud platforms, ensuring seamless data ingestion and analysis.
* Develop and optimize data pipelines (Logstash, Beats, custom agents) for telemetry, log, and event feeds, driving business insights and informed decision-making.
* Create dynamic Kibana dashboards to visualize performance, availability, and security metrics, empowering stakeholders with real-time visibility.
* Onboard and integrate new data sources from applications, platforms, networks, containers, and third-party systems, expanding the scope of monitoring and analytics capabilities.
* Define and manage alerting rules to drive proactive incident detection and response, minimizing downtime and ensuring business continuity.
* Contribute to incident response and root cause analysis through forensic log analysis, delivering actionable insights and recommendations.
* Research, evaluate, and implement emerging tools and technologies, staying ahead of the curve in system monitoring and analytics.
* Develop and maintain SOPs, build guides, and technical documentation, ensuring knowledge sharing and operational excellence.
The ideal candidate will possess:
* Proven experience in Defence environments, especially with monitoring and data ingestion.
* Strong background in Linux administration, virtual systems, and distributed infrastructures.
* Hands-on expertise with the Elastic Stack (Elasticsearch, Logstash, Kibana, Beats).
* Solid knowledge of log aggregation, metrics collection, and event-driven monitoring.
* Experience with hybrid cloud or containerized environments.
* Scripting ability in Python or Shell, and familiarity with syslog, SNMP, and APIs.
* Strong communication and stakeholder engagement skills.