This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
SingleStore is seeking a Site Reliability Engineer to help optimize and scale our managed service offering across all three major cloud providers. In this role, you will be at the intersection of leading technology trends – A highly performant distributed database, managed by Kubernetes, running in the cloud. This is a great opportunity to push the boundaries with a cloud-focused SRE role. This is a development role, requiring an engineering mindset to solve operational challenges. You will be part of a globally distributed team of engineers, helping to drive SRE practices across the company. Through infrastructure automation, you will help us grow our service across multiple cloud platforms. This requires a relentless focus on eliminating manual processes. You will also leverage our monitoring platform to improve the overall customer experience by systematically identifying and fixing any issues impacting our customers. As an SRE, you will also help diagnose issues on the platform, leveraging a deep understanding of the SingleStore query engine along with the backend infrastructure.
Job Responsibility:
Develop automation platform to manage infrastructure rollouts across cloud providers
Optimize telemetry platform to identify customer impacting events while providing relevant data to drive debugging
Partner with engineering team to optimize performance of services for cloud architecture
Debug Live Site events and conduct follow-up postmortem and RCA analysis
Participate in an SLA-driven on-call rotation, which will include after-hours, weekend, and rotating holiday participation.
Requirements:
5 years of demonstrated experience working as a Site Reliability Engineer
Infrastructure automation experience. Scripting experience (Python, Bash) a plus.
Experience with the Prometheus monitoring stack. Experience with Grafana, Mimir and Loki is a plus.
Knowledge of Kubernetes and the container ecosystem
Strong cross group collaboration and communication skills
Familiar with at least one of AWS, Azure, or Google Cloud
Experience debugging, diagnosing and troubleshooting complex, production software
B.S. Degree in Computer Science or related field
Nice to have:
Scripting experience (Python, Bash) a plus.
Experience with Grafana, Mimir and Loki is a plus.
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.