This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
LogicMonitor® is the AI-first hybrid observability platform powering the next generation of digital infrastructure. LogicMonitor delivers complete visibility and actionable intelligence across on-premises, cloud, and edge environments. By anticipating issues before they strike, optimizing resources in real time, and enabling faster, smarter decisions, LogicMonitor helps IT and business leaders protect margins, accelerate innovation, and deliver exceptional digital experiences without compromise. We are seeking a highly skilled Senior DevOps Engineer having 4+ years of experience to drive innovation, reliability, and security across our cloud infrastructure on the Edwin AI team at LogicMonitor. The ideal candidate has hands-on experience managing multi-cloud environments, automating infrastructure, and implementing modern DevOps practices that improve system performance, scalability, and cost efficiency
Job Responsibility:
Multi-Cloud Enablement: Expand and manage application hosting across AWS and Google Cloud, ensuring performance, flexibility, and resilience
Infrastructure as Code (IaC): Develop and maintain Terraform or similar installers for Azure and GCP to fully automate infrastructure deployments
Cost Optimization: Design and implement AWS cost optimization strategies, including reserved instances, right-sizing, and resource efficiency initiatives
Cloud Security: Strengthen infrastructure security with robust access controls, encryption, monitoring, and alerting frameworks
Observability: Build and enhance monitoring platforms with Grafana dashboards and Prometheus alerts for real-time performance insights and proactive issue resolution
Kubernetes Management: Implement Role-Based Access Control (RBAC) and optimize Ingress controllers (Traefik or similar) for enhanced security and delivery resilience
Automation & Scripting: Create Python and Bash scripts to automate repetitive tasks, streamline workflows, and improve operational efficiency
Requirements:
4+ years of experience in DevOps or similar roles
Proven experience with AWS (preferred), and GCP in production environments
Strong expertise in Infrastructure as Code practices
Solid knowledge of Kubernetes (EKS), container orchestration, and cluster security
Hands-on experience with Grafana, Prometheus, and alerting/monitoring systems
Understanding of network connectivity over the private link endpoint, VPC, cross-account vpc connectivity, how to make things accessible internally, externally, etc.
Experience in deploying automated Canary and Integration testing pipelines, CI/CD pipeline etc.
Exposing internal self-hosted services like LangFuse via WebUI for internal users using Traefik or Ingress controller or any other tool
Experience in deployment of LLM related solutions that require MCP, LangFuse, Airflow, GraphDB, VectorDB, Redis etc.
Experience working with developers on on-demand JIT access to Prod clusters to troubleshoot/debug issues with tools like Teleport or some other
Strong background in cloud security, access management, and encryption
Proficiency in Python and Bash scripting for automation
Experience optimizing cloud infrastructure for performance and cost