This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are currently seeking several experienced and highly skilled Staff Observability Operations Engineers with a strong background in Site Reliability Engineering (SRE), modern observability practices, and the management and implementation of observability and event management platforms. Responsibilities include deploying observability solutions, administration of platforms, release management, system upgrades, integrations, troubleshooting incidents, and continuous planning to enhance platform performance. Successful candidates will play a key role in ensuring our observability infrastructure meets the current and future needs of CVS Health’s dynamic environment.
Job Responsibility:
Deploy and implement modern observability solutions
Manage and administer observability and event management platforms
Coordinate and manage release cycles for observability platforms
Troubleshoot and resolve incidents related to observability platforms
Continuously monitor and enhance platform performance
Collaborate with cross-functional stakeholders
Provide training and mentoring to junior engineers
Ensure compliance and security of observability platforms
Maintain documentation of observability platform configurations
Generate and analyze reports on platform performance and capacity
Requirements:
7+ Years of experience in IT operations, with significant responsibilities in system monitoring, performance tuning, and troubleshooting enterprise applications
5+ Years in a Site Reliability Engineering (SRE) role deploying and managing modern observability solutions
5+ Years managing and implementing observability and event management platforms (e.g., AppDynamics, Splunk, Prometheus, Grafana)
Experience developing and administering ServiceNow ITOM event management solutions
Experience deploying and managing service reliability platforms (e.g., xMatters, OpsGenie, PagerDuty)
Experience with and deep knowledge of cloud environments, cloud monitoring platforms, and container orchestration tools (e.g., AWS/CloudTrail, Azure/Monitor, GCP/GCM, Kubernetes, OpenShift)
Proficiency in Python and other scripting languages such as Ansible, PowerShell, Bash for automation and configuration
Hands-on experience deploying, managing, and administering observability platforms
Hands-on experience leading, coordinating, and performing migration of application, platform, and infrastructure observability solutions
Proven ability to troubleshoot and resolve complex technical issues
Experience monitoring platform performance and implementing enhancements to support scalability
Knowledge of compliance and security standards related to observability platforms
Excellent communication skills, both verbal and written
Experience with configuring and leveraging source code management tools and workflows
Proficiency in scripting and programming languages such as Ansible, PowerShell, Bash, Python, YAML, XML, and JSON
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.