Senior Software Engineer, Site Reliability Job at Babylist

Senior Site Reliability Engineer

We are looking for a Senior Site Reliability Engineer who is passionate about sc...

Location

Salary:

Not provided

Atlassian

Expiration Date

Until further notice

Requirements

5+ years experience operating high-availability, fault-tolerant, scalable, distributed software in production: building monitoring, tweaking dashboards, defining alerts, writing runbooks, etc.
5+ years of hands on experience with public cloud offerings (AWS components like EC2, CloudFormation, RDS / Aurora, Caches, SQS - or equivalents, e.g. in GCP / Azure)
Familiarity with Unix / Linux operating systems
Strong emphasis to debug, improve code, and automate routine tasks
Strong backend engineering experience in one or more prominent languages such as Java, Go or Python
Excellent communication skills in written and verbal forms, and an ability to communicate complex technical issues to a range of technical and non-technical audiences (management, peers, clients)
An ability and desire to mentor and coach engineers

What we offer

health coverage
paid volunteer days
wellness resources

Fulltime

Senior Site Reliability Engineer

Baxter International is seeking a skilled Senior Principal Site Reliability Engi...

Location

United States , Deerfield

Salary:

96000.00 - 132000.00 USD / Year

Baxter

Expiration Date

Until further notice

Requirements

Bachelor's degree in computer science, IT, or related field (or equivalent experience)
Prior experience in Site Reliability Engineering and cloud-based infrastructure management
Experience in enterprise engineering, including 24x7 uptime, regulated environments, and planning/operations
Azure administration and operations experience, with certifications a plus
Knowledge of related technologies, including cloud, encryption, and security protocols
Systems administration experience in Windows and Linux environments
Proven problem-solving skills and experience with scripting and automation tools
Ability to create accurate documentation and reports, with excellent communication skills

Job Responsibility

Drive strategies to ensure 24x7 availability of services and business continuity for customer facing healthcare software applications and platforms hosted on Microsoft Azure cloud
Manage and administer Azure resources, including virtual machines, databases, and networking components
Define and document operating procedures to ensure required security, privacy and other compliance standards are maintained for digital solutions deployed in cloud
Manage process, planning, and execution for Disaster Recovery (DR) and Business Continuity Planning (BCP)
Define and refine Operations SLAs to maintain high level of Customer Satisfaction
Establish non-functional requirements to meet SLAs
Establish infrastructure and application monitoring dashboards and workflow for automatic routing of notifications
Define key performance indicators that can be monitored, measured, and used to derive opportunities
Standardize site metrics for stakeholders, reporting on various KPIs including SLAs, availability, capacity utilization, service metrics and cost utilization
Work closely with DevOps Engineers to automate infrastructure provisioning and deployment processes

What we offer

Healthcare benefits
Employee Stock Purchase Plan (ESPP)
401(k) Retirement Savings Plan
Flexible Spending Accounts
Educational assistance programs
Paid holidays
Paid time off
Paid parental leave
Commuting benefits
Employee Discount Program

Fulltime

Senior Site Reliability Engineer

This is a role at Baxter where your work impacts saving and sustaining lives thr...

Location

United States , Deerfield

Salary:

96000.00 - 132000.00 USD / Year

Baxter

Expiration Date

Until further notice

Requirements

Bachelor's degree in computer science, IT, or related field (or equivalent experience)
Prior experience in Site Reliability Engineering and cloud-based infrastructure management
Experience in enterprise engineering, including 24x7 uptime, regulated environments, and planning/operations
Azure administration and operations experience, with certifications a plus
Knowledge of related technologies, including cloud, encryption, and security protocols
Systems administration experience in Windows and Linux environments
Proven problem-solving skills and experience with scripting and automation tools
Ability to create accurate documentation and reports, with excellent communication skills
Applicants must be authorized to work for any employer in the U.S.
Unable to sponsor or take over sponsorship of an employment visa at this time.

Job Responsibility

Drive strategies to ensure 24x7 availability of services and business continuity for customer-facing healthcare software applications and platforms hosted on Microsoft Azure cloud
Manage and administer Azure resources, including virtual machines, databases, and networking components
Define and document operating procedures to ensure required security, privacy and other compliance standards are maintained for digital solutions deployed in cloud
Manage process, planning, and execution for Disaster Recovery (DR) and Business Continuity Planning (BCP)
Define and refine Operations SLAs to maintain high level of Customer Satisfaction
Establish non-functional requirements to meet SLAs
Establish infrastructure and application monitoring dashboards and workflow for automatic routing of notifications
Define key performance indicators that can be monitored, measured, and used to derive opportunities
Standardize site metrics for stakeholders, reporting on various KPIs including SLAs, availability, capacity utilization, service metrics and cost utilization
Work closely with DevOps Engineers to automate infrastructure provisioning and deployment processes.

What we offer

Support for Parents
Continuing Education/Professional Development
Employee Health & Well-Being Benefits
Paid Time Off
2 Days a Year to Volunteer
Medical and dental coverage starting day one
Insurance coverage for basic life, accident, short-term and long-term disability
Business travel accident insurance
Employee Stock Purchase Plan (ESPP)
401(k) Retirement Savings Plan

Fulltime

Senior Site Reliability Engineer

Architect, develop, and troubleshoot large-scale infrastructure, maintain and im...

Location

United States , San Francisco

Salary:

180960.00 - 230900.00 USD / Year

Atlassian

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science, Software Engineering, Information Technology or a closely related field
four years of experience as a Site Reliability Engineer architecting, developing, and troubleshooting large scale infrastructure utilizing programming languages such as PowerShell, Python, or Bash
networking technologies such as TCP/IP or security
four years of experience in automation development and infrastructure as code implementation using tools such as Terraform, AWS CloudFormation, Ansible, or Salt
knowledge of Linux and Windows systems
cloud technologies within AWS, GCP, Azure
continuous integration continuous delivery/deployment (CICD) practices and monitoring and observability practices
must pass technical interview

Job Responsibility

Architect, develop, and troubleshoot large scale infrastructure utilizing programming languages such as PowerShell, Python, or Bash and networking technologies such as TCP/IP or security
provide real-time feedback on production systems
work with product family and platform developers to maintain and improve services and performance with a strong customer focus
utilize a variety of data collection, enrichment, analytics, and visualizations to support our complex systems
responsible for automation development and infrastructure-as-code implementation using tools such as Terraform, AWS CloudFormation, Ansible, and/or Salt
build solutions to enhance availability, performance, and stability for hundreds of Atlassian enterprise customers in the cloud as well as automate repetitive work
help secure the cloud architecture with penetration testing, vulnerability resolution, and compliance audit responses
responsible for continuous integration continuous delivery/deployment (CICD) practices and monitoring and observability practices

What we offer

Health and wellbeing resources
paid volunteer days

Fulltime

Senior Site Reliability Engineer

HiveWatch is seeking a Staff Site Reliability Engineer to join our Platform Team...

Location

United States , El Segundo

Salary:

183000.00 - 235000.00 USD / Year

HiveWatch

Expiration Date

Until further notice

Requirements

7+ years of software engineering experience with strong coding skills in production environments
5+ years of SRE, DevOps, or production operations experience
Expertise with cloud platforms (AWS preferred) and containerized applications (Docker, Kubernetes)
Experience with Infrastructure as Code (Terraform, CloudFormation, or similar)
Proficiency in at least one object oriented programming language in our tech stack (Java, Kotlin, Python)
Hands-on experience with relational databases and SQL performance optimization
Experience with monitoring and observability tools (Prometheus, Grafana, DataDog, or equivalent)
Strong debugging skills across distributed systems and microservices architectures
Bachelor's degree in Computer Science, Engineering, or equivalent practical experience

Job Responsibility

Own the reliability of mission-critical systems including production monitoring, alerting, and capacity planning
Debug and resolve complex production issues across the full stack, from infrastructure to application code
Participate in a regular on-call rotation to provide 24/7 coverage for critical systems
Perform root cause analysis requiring deep code-level investigation and implement preventive measures
Build automation and tooling to reduce operational toil and improve system reliability
Maintain CI/CD pipelines, observability infrastructure, and database performance optimization
Increase the resiliency, scalability, and maintainability of production environments
Establish on-call procedures and disaster recovery processes
Provide technical leadership and mentorship to foster engineering excellence and reliability culture

What we offer

Comprehensive health coverage: medical, dental, vision, and life insurance
Cutting-edge work in an emerging field with huge growth potential
Competitive compensation packages designed to reward top talent
A modern, newly renovated HQ right on Main Street in El Segundo, CA
401(k) with a 4% company match to help you invest in your future (match launches in 2026)
Flexible paid time off so you can recharge when you need it
Additional benefits include ClassPass credits and a discount on pet insurance
A family-friendly, compassionate culture that values balance and belonging
Eligible to participate in HiveWatch Equity Incentive Plan

Fulltime

Senior Site Reliability Engineer

What will you be doing at Miniclip? Participate in an on-call rotation with the ...

Location

Portugal , Lisbon

Salary:

Not provided

Miniclip

Expiration Date

Until further notice

Requirements

5+ years of hands-on experience with AWS in both development and operations contexts
Strong Linux system administration skills, including performance tuning and debugging
Software development background and strong coding skills in one or more of the following: Go, Python, Ruby
Experience with Infrastructure as Code, particularly Terraform
Familiarity with CI/CD pipelines and artifact management tools
A mindset for resilient systems design, thinking about edge cases, failure modes, and graceful degradation
Excellent communication skills in English, both written and spoken
Comfortable in a fast-paced environment and adaptable to shifting priorities

Job Responsibility

Participate in an on-call rotation with the Cloud Engineering team to respond to production incidents and outages
Operate and evolve infrastructure using Infrastructure as Code (Terraform), configuration management tools, and containerized platforms on AWS
Build and maintain observability tooling to detect symptoms before they lead to outages
Automate repetitive tasks and processes to reduce operational toil
Collaborate with Engineering and Product teams to design resilient systems that meet performance and reliability goals
Troubleshoot production issues across application, network, and infrastructure layers
Document systems, processes, and runbooks to improve team transparency and onboarding

Senior Site Reliability Engineer

You'll join the team primarily responsible for making our self-hosted product of...

Location

United States

Salary:

200000.00 - 220000.00 USD / Year

Tines

Expiration Date

Until further notice

Requirements

5-8 years in an SRE or similar role
Experience architecting, maintaining, and supporting systems with containerized applications, ideally k8s
Experience with troubleshooting deployment issues, creating clear documentation, and designing robust escalation paths
Comfortable learning new technologies
Experience with Ruby, Rails, React, TypeScript, Postgres, Redis and Docker
Customer obsessed and willing to go deep into unfamiliar stacks to find root causes
Authorized to work for any employer in the U.S.

Job Responsibility

Making our self-hosted product offering as easy as possible for customers to install and operate
Owning all of the supporting services and tools that our self-hosted customers rely on
Identifying and fixing availability risks and monitoring gaps
Enabling software engineers to build new product features that work seamlessly across cloud and self-hosted environments
Using our own product extensively to automate infrastructure maintenance and to build DevOps tooling for customer deployments
Identifying areas for improvement in our containerized architecture and deployment strategies
Mentoring other engineers in container orchestration and Kubernetes best practices
Act as a subject matter expert for critical self-hosted customer issues

What we offer

Competitive salary
Startup equity & extended exercise window
Matching retirement plans
Home office setup
Private healthcare plans
25 days annual leave
Extra company holidays
Generous parental leave programs
Flexibility in how and where you work
Phone and home Internet allowance

Fulltime

Senior Site Reliability Engineer Cloud Platform

Zilliz is a fast-growing startup developing the industry’s leading vector databa...

Location

Salary:

175000.00 - 225000.00 USD / Year

Zilliz

Expiration Date

Until further notice

Requirements

4+ years of experience in site reliability engineering or similar roles with a focus on cloud-native systems
Proficiency in scripting languages such as Python, Go, or Java
Strong knowledge of container orchestration technologies like Kubernetes and Docker
Expertise with cloud platforms such as AWS, GCP, or Azure, and their respective monitoring and management tools
Experience with infrastructure as code tools such as Terraform or Ansible
Familiarity with CI/CD tools such as Jenkins, GitLab CI, or Argo
Proven ability to troubleshoot complex distributed systems and resolve issues promptly
Bachelor’s degree or above in computer science, software engineering, or other relevant disciplines
Ability to thrive in a fast-paced, startup environment and handle multiple projects simultaneously

Job Responsibility

Work at the intersection of development and site reliability. Creating SRE tools and systems, as well as supporting existing infrastructure and platforms
Ensure the reliability, availability, and performance of Zilliz’s distributed database systems
Develop and implement strategies for monitoring, incident management, and disaster recovery
Automate system operations and maintenance tasks to improve efficiency and reduce manual intervention
Design and build tools to manage and monitor infrastructure, ensuring scalability and robustness
Collaborate with software engineers to enhance system reliability, scalability, and performance
Maintain and improve the CI/CD pipeline to ensure smooth and rapid deployment of changes
Actively contribute to the Milvus Vector Database open-source community, focusing on improving reliability and operational efficiency

Fulltime

Senior Software Engineer, Site Reliability

Babylist

Location:
United States; Canada

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Additional Information:

Job Posted:
December 06, 2025

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Senior Software Engineer, Site Reliability

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer Cloud Platform

Senior Software Engineer, Site Reliability

Babylist

Location:United States; Canada

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Additional Information:

Job Posted:December 06, 2025

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Senior Software Engineer, Site Reliability

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer Cloud Platform

Location:
United States; Canada

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
December 06, 2025