CrawlJobs Logo

Advanced Site Reliability Engineer

symbotic.com Logo

Symbotic

Location Icon

Location:
United States , Wilmington

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

117841.19 - 165000.00 USD / Year

Job Description:

We are looking for a passionate, hard-working, and talented Site to take the lead on solving some of the toughest operational challenges in some of the most sensitive and mission-critical automated warehouse solutions. The SRE team will drive the stability and sustainability of these next-generation systems and discover innovative ways to scale and operate them reliably as we expand. In this role, you will work with cross functional teams such as Operations, Mechanical Hardware Engineering, IT infrastructure Systems, and Software Engineering teams to identify and address underlying resiliency gaps.

Job Responsibility:

  • Analyze various sources of metric, dashboards, phrasing logs and articulating that to a facts-based actionable Root Cause Analysis investigation to lead a group of Subject Matter Experts teams to find the actual cause
  • Host RCA calls as a chair and drive the RCA process to conclusion within tight SLAs with customer-facing deliverables
  • Lead problem tickets and improvements to major software components, systems, and features to improve the availability, scalability, latency, and efficiency of the Symbotic System
  • Engage in and improve the service lifecycle from inception and design to deployment, operation, and refinement based on lessons learned through deep dives
  • Hands-on troubleshooting of VMware, Kubernetes, Custom Software, and infrastructure performance incidents
  • Be a trusted technical advisor who leads complex root cause analysis investigations from beginning to end until maximum improvements are identified
  • Demonstrate sound knowledge of gathering logs and facilitating a facts-based root cause analysis with cross-functional teams
  • Assist internal teams with corrective actions and improvement tickets and influence the completion goals
  • Flexibility to work during occasional out of standard hours including weekends may be required depending on the cruciality and workload demands
  • Ability to travel up to 10%

Requirements:

  • Bachelor’s degree in Software Engineering, Information Systems, Computer Science or a related field
  • Minimum of 5 years of experience working on ITSM tools such as Jira or equivalent tool
  • Minimum of 5 years of infrastructure engineering experience with a record demonstrating the delivery of high-quality, large-scale solutions requiring planning and change control
  • Minimum of 5 years of experience in operation of production systems including troubleshooting, testing, and automation
  • Minimum of 5 years of experience leading technical Root Cause Analysis
  • Ability to prioritize parallel RCA investigations and tasks by influencing cross-functional teams to complete actions on time with demanding quality
  • Experience with executive incident communications, RCA report writing, and written communication skills to non-technical audiences
  • Ability to transfer vast technical background to projects through excellent problem-solving and competence to work with other technical teams
  • Efficiently read and understand Gitlab technical documentation
  • Experience in the advanced use of tools like Prometheus, Grafana, Logic Monitor, Elastic, VMware and use of CLI (Kube or Linux)

Nice to have:

  • ITIL Problem Management experience
  • Experience in the advanced use of tools like Prometheus, Grafana, Logic Monitor, Elastic, VMware and use of CLI (Kube or Linux)
  • Knowledge of Power BI, Tableau, executive report writing, and presentation skills
What we offer:
  • medical
  • dental
  • vision
  • disability
  • 401K
  • PTO

Additional Information:

Job Posted:
February 04, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Advanced Site Reliability Engineer

Site Reliability Engineer

Join our client, a leading financial institution at the forefront of innovation,...
Location
Location
United States , Austin
Salary
Salary:
57.00 - 63.33 USD / Hour
aquent.com Logo
Aquent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience leading engineering teams and delivering projects using Scrum and efficient release practices
  • Strong background in converting high-level designs into low-level designs and providing technical oversight
  • Demonstrated experience in designing, architecting, and deploying cloud-native applications, specifically on GCP
  • Proficiency with various database technologies, including MongoDB, Aerospike, SQL Server, and PostgreSQL
  • Expertise in containerization technologies such as Docker and Kubernetes, and building/managing CI/CD pipelines
  • Experience leveraging AI-Driven software development tools to enhance productivity, code comprehension, and documentation
  • Proven track record of integrating and applying AI/Machine Learning models for data analytics, visualization, automation, and problem-solving
  • Ability to maintain high quality standards while delivering within tight schedules
  • Exceptional collaborative mindset with a bias for action, engaging effectively with product management, architects, and other domains
  • Strong ability to work with internal, external, and offshore stakeholders
Job Responsibility
Job Responsibility
  • Drive Technical Leadership & Project Delivery: Lead engineering teams through the entire project lifecycle, leveraging agile methodologies like Scrum to ensure efficient delivery and robust release practices
  • Architect & Design Cloud-Native Solutions: Translate high-level architectural visions into detailed low-level designs, providing expert technical oversight for the development and deployment of cutting-edge cloud-native applications
  • Champion Reliability & Scalability: Design, architect, and deploy highly available and scalable cloud-native applications on platforms such as GCP, ensuring optimal performance and resilience
  • Optimize Data Management: Leverage your expertise with diverse database technologies, including MongoDB, Aerospike, SQL Server, and PostgreSQL, to build and maintain robust data solutions
  • Advance DevOps & Automation: Implement and optimize containerization strategies using technologies like Docker and Kubernetes, and establish sophisticated CI/CD pipelines to streamline development and deployment
  • Innovate with AI/ML: Integrate and apply AI/Machine Learning models to enhance data analytics, visualization, automation, and creatively solve complex business and technical challenges
  • Foster Collaboration & Mentorship: Work closely with diverse stakeholders across product management, architecture, and other engineering domains, while actively mentoring and coaching multiple teams to elevate technical capabilities
  • Influence & Present Solutions: Effectively engage subject matter experts, present complex architectural solutions to governance boards and stakeholders, and advocate for data-driven proposals
What we offer
What we offer
  • subsidized health, vision, and dental plans
  • paid sick leave
  • retirement plans with a match
Read More
Arrow Right

Site Reliability Engineer Application Development Technical Lead Analyst

The Applications Development Technology Lead Analyst is a senior level position ...
Location
Location
Canada , Mississauga
Salary
Salary:
120800.00 - 170800.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of relevant experience in Apps Development or systems analysis role
  • 5+ years extensive experience system analysis and in programming of software applications with Python and RHEL
  • 5+ years with Site reliability & CI/CD pipelines
  • Previous experience with containerization orchestration
  • Experience in managing and implementing successful projects
  • Subject Matter Expert (SME) in at least one area of Applications Development
  • Ability to adjust priorities quickly as circumstances dictate
  • Demonstrated leadership and project management skills
  • Consistently demonstrates clear and concise written and verbal communication
  • Bachelor's degree/University degree or equivalent experience
Job Responsibility
Job Responsibility
  • Partner with multiple management teams to ensure appropriate integration of functions to meet goals
  • Identify and define necessary system enhancements to deploy new products and process improvements
  • Resolve variety of high impact problems/projects through in-depth evaluation of complex business processes, system processes, and industry standards
  • Provide expertise in area and advanced knowledge of applications programming and ensure application design adheres to the overall architecture blueprint
  • Utilize advanced knowledge of system flow and develop standards for coding, testing, debugging, and implementation
  • Develop comprehensive knowledge of how areas of business integrate to accomplish business goals
  • Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions
  • Serve as advisor or coach to mid-level developers and analysts, allocating work as necessary
  • Appropriately assess risk when business decisions are made
  • Fulltime
Read More
Arrow Right

Lead Site Reliability Engineer

Groupon is a marketplace where customers discover new experiences and services e...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
groupon.com Logo
Groupon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years in systems engineering
  • at least 5+ years in SRE or DevOps roles
  • expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, Docker)
  • proficiency in programming and scripting languages like Python, Go, and Bash
  • advanced knowledge of Infrastructure as Code (IaC) tools such as Terraform and Ansible
  • deep understanding of networking, DNS, load balancing, and security principles
  • proven track record of managing high-availability systems in demanding environments
  • exceptional analytical and problem-solving skills
Job Responsibility
Job Responsibility
  • Architect and maintain fault-tolerant systems, ensuring uptime SLAs of 99.9% or higher
  • drive automation in infrastructure management and deployment using Terraform, Ansible, Kubernetes, and similar tools
  • create and optimize CI/CD pipelines to ensure reliable, secure, and efficient software delivery
  • build and enhance comprehensive observability solutions, including monitoring, logging, and alerting systems using Prometheus, Grafana, and the ELK stack
  • collaborate with stakeholders to define and achieve SLIs, SLOs, and error budgets aligned with business needs
  • lead incident response during on-call rotations, ensuring rapid resolution and root cause analysis for critical issues
  • design and execute performance testing, capacity planning, and scalability strategies for evolving workloads
  • proactively identify and resolve bottlenecks, increasing system performance and developer efficiency
  • mentor junior engineers, fostering a collaborative and growth-oriented team environment
  • guide architectural decisions that drive innovation and enhance system reliability
What we offer
What we offer
  • The opportunity to work with cutting-edge technologies in a transformative environment
  • a collaborative and innovative work values alignment that values your expertise and contributions
  • professional growth and leadership development pathways tailored to your aspirations
  • a chance to leave a lasting impact by shaping the future of reliable and scalable systems
Read More
Arrow Right

Senior Database Engineer

We’re looking for a skilled Data Reliability Engineer to join our team for a cli...
Location
Location
Salary
Salary:
Not provided
zoolatech.com Logo
Zoolatech
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in Data Engineering, Database Reliability, or Infrastructure Operations
  • Strong expertise in PostgreSQL on AWS, including tuning, replication, backups, and HA configurations
  • Experience operating RDBMS databases (PostgreSQL, MySQL, etc.) and Kubernetes technologies is highly desirable
  • Experience provisioning and operating NoSQL databases at scale like Elasticsearch, Elastic Cache, DynamoDB, Neo4j, Mongo, Cassandra, etc.
  • Advanced SQL scripting and query optimization skills
  • Experience with data systems monitoring, alerting, and performance tuning
  • Strong programming/scripting in Java, Python, or Shell
  • Proven experience in designing or supporting complex data ecosystems
  • Solid understanding of cloud infrastructure (preferably AWS) and Infrastructure as Code tools (Terraform)
  • Familiarity with event streaming platforms (Kafka), and observability stacks (New Relic, ELK, etc.)
Job Responsibility
Job Responsibility
  • Own and optimize the reliability, availability, and performance of data infrastructure across production systems
  • Lead the design and implementation of resilient, secure, and observable data systems
  • Collaborate with SRE, Security, and Engineering teams to enforce data infrastructure standards and align on architectural decisions
  • Design and implement automation around provisioning, uptime monitoring, data refresh, integrity, backups, and disaster recovery
  • Support application developers with performance tuning, complex query optimization, and database design reviews
  • Analyze and resolve performance bottlenecks and incidents with a focus on long-term solutions
  • Participate in on-call rotation to support production systems and ensure high availability
  • Actively contribute to improving incident response and observability through metrics, alerting, and runbooks
  • Work with technologies such as Java, Ruby on Rails, PostgreSQL, AWS, Kafka, S3, Elasticsearch
What we offer
What we offer
  • Paid Vacation
  • Sick Days
  • Floating Holidays
  • Sport/Insurance Compensation
  • English Classes
  • Charity
  • Training Compensation
Read More
Arrow Right

Senior Database Engineer

We’re looking for a skilled Data Reliability Engineer to join our team for a cli...
Location
Location
United States
Salary
Salary:
Not provided
zoolatech.com Logo
Zoolatech
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in Data Engineering, Database Reliability, or Infrastructure Operations
  • Strong expertise in PostgreSQL on AWS, including tuning, replication, backups, and HA configurations
  • Experience operating RDBMS databases (PostgreSQL, MySQL, etc.) and Kubernetes technologies is highly desirable
  • Experience provisioning and operating NoSQL databases at scale like Elasticsearch, Elastic Cache, DynamoDB, Neo4j, Mongo, Cassandra, etc.
  • Advanced SQL scripting and query optimization skills
  • Experience with data systems monitoring, alerting, and performance tuning
  • Strong programming/scripting in Java, Python, or Shell
  • Proven experience in designing or supporting complex data ecosystems
  • Solid understanding of cloud infrastructure (preferably AWS) and Infrastructure as Code tools (Terraform)
  • Familiarity with event streaming platforms (Kafka), and observability stacks (New Relic, ELK, etc.)
Job Responsibility
Job Responsibility
  • Own and optimize the reliability, availability, and performance of data infrastructure across production systems
  • Lead the design and implementation of resilient, secure, and observable data systems
  • Collaborate with SRE, Security, and Engineering teams to enforce data infrastructure standards and align on architectural decisions
  • Design and implement automation around provisioning, uptime monitoring, data refresh, integrity, backups, and disaster recovery
  • Support application developers with performance tuning, complex query optimization, and database design reviews
  • Analyze and resolve performance bottlenecks and incidents with a focus on long-term solutions
  • Participate in on-call rotation to support production systems and ensure high availability
  • Actively contribute to improving incident response and observability through metrics, alerting, and runbooks
  • Work with technologies such as Java, Ruby on Rails, PostgreSQL, AWS, Kafka, S3, Elasticsearch
What we offer
What we offer
  • Paid Vacation
  • Sick Days
  • Floating Holidays
  • Sport/Insurance Compensation
  • English Classes
  • Charity
  • Training Compensation
  • Fulltime
Read More
Arrow Right

Civil Engineer

Leading Infrastructure Improvements. Make a positive impact as a Klingner Civil ...
Location
Location
United States , Galesburg
Salary
Salary:
Not provided
klingner.com Logo
Klingner & Associates, PC
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of demonstrated experience in site development / municipal engineering
  • Bachelor of Science in Civil Engineering or closely-related engineering field from an accredited university
  • Must hold a Professional Engineer license in Iowa, or can obtain within three months
  • Must have excellent working knowledge of Civil3D, Microsoft Excel, and Microsoft Word
  • Excellent time management skills, organizational skills, the ability to manage multiple complex projects and tasks concurrently, and a commitment to meeting deadlines and keeping projects within budget
  • Clear written and verbal communication
  • Understanding of and experience in applying regulatory agency laws, ordinances, and regulations for Municipal, County, State, and Federal agency permitting submittals
  • Experience writing thorough reports and studies, scopes of work, and proposals
  • Excellent interpersonal skills with the ability to hold strong business relationships with municipal and residential/commercial/industrial clients
  • Ability to conduct all projects by quality standards and Klingner procedures, with a strong focus on accuracy and quality of work
Job Responsibility
Job Responsibility
  • Lead site development and municipal infrastructure projects while managing timelines, budgets, and overall quality
  • Develop and review designs, plans, specifications, permitting, and final construction documents for site developments, subdivision infrastructure, municipal infrastructure, and general civil engineering projects
  • Consider various technologies, system efficiency, reliability, operator capabilities, long-term need, and environmental impact to promote sustainable design
  • Provide specialized technical input as well as storm water drainage hydraulic / hydrologic modeling and evaluations to support reporting and design
  • Complete Preliminary Engineering Reports, opinions of probable construction cost, and all other necessary reports and estimates
  • Identify and address potential project risks and independently produce mitigation strategies
  • Ensure compliance with all applicable Local, State, and Federal regulations concerning various site locations
  • Maintain current knowledge of regulatory trends to provide forward-thinking solutions
  • Follow industry design trends, technological advancements, professional organizations, and new certifications
  • Implement Klingner standards and quality control procedures to ensure project safety, reliability, and longevity
What we offer
What we offer
  • Mentor / Mentee opportunities
  • Professional development assistance
  • Health insurance (three coverage options available)
  • Vision and dental insurance
  • VPTO, sick leave, and 7.5 paid holidays
  • 401(k) retirement savings plan with employer match
  • Health savings and Flexible Spending
  • Yearly Wellness reimbursement
  • Bonus opportunities
  • Referral program
  • Fulltime
Read More
Arrow Right

Python Developer

We are looking for a Python Developer (with Infrastructure skills) with strong i...
Location
Location
Salary
Salary:
Not provided
remotivatejobs.com Logo
RemotivateJobs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience as a Site Reliability Engineer, System Engineer, Infrastructure Engineer, Platform Engineer, Backend Systems Engineer, or similar role, ideally as a Python Developer
  • Experience running and maintaining Python/Flask applications in production
  • Advanced Python development skills, particularly with Python libraries/frameworks
  • In-depth knowledge of Linux server administration (Debian/Ubuntu)
  • Proficiency with network analysis tools: intercepting proxies, packet captures (Wireshark, mitmproxy, tcpdump, etc.)
  • Familiarity with distributed systems, scaling strategies, and performance tuning
  • Strong understanding of monitoring and logging systems (e.g., Prometheus, Grafana, ELK, Datadog)
  • Experience with version control (Git) and CI/CD workflows
  • Comfort with automation tools and scripting for infrastructure management
  • Excellent troubleshooting and analytical skills
Job Responsibility
Job Responsibility
  • Maintain and optimize infrastructure: Manage Linux-based (Debian/Ubuntu) servers running Python/Flask applications, ensuring stability and performance
  • Ensure high uptime: Continuously monitor system health and proactively address bottlenecks or weak points to maximize the reliability of SMS send-outs
  • Troubleshoot complex issues: Use intercepting proxies, packet captures, and diagnostic tools to identify, analyze, and resolve traffic or delivery issues
  • Optimize backend workflows: Work with Python/Flask async frameworks to streamline message queuing, delivery, and scaling mechanisms
  • Implement monitoring and alerting: Set up dashboards, logs, and alerts that provide visibility into system health and performance
  • Automate infrastructure tasks: Build tools/scripts to reduce manual work and ensure consistency in deployments and optimizations
  • Own decision-making: Take initiative in addressing infrastructure needs and make competent technical decisions without requiring constant supervision
What we offer
What we offer
  • Endless growth opportunities as they’re in a scale-up phase
  • Potential to move into a more elaborate R&D or leadership role
  • Flexible working schedule as long as deadlines and quality are met
  • Work alongside highly skilled developers in a unique and challenging industry
  • Performance bonuses as the company grows
  • Fully remote setup
  • Fulltime
Read More
Arrow Right

FX Applications Support Senior Analyst

As an FX Application Support Analyst, you will play a key role in running and ma...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5-8 years’ experience in an Application Support role
  • experience installing, configuring or supporting business applications
  • experience with some programming languages and willingness/ability to learn
  • advanced execution capabilities and ability to adjust quickly to changes and re-prioritization
  • effective written and verbal communications including ability to explain technical issues in simple terms that non-IT staff can understand
  • demonstrated analytical skills
  • issue tracking and reporting using tools
  • knowledge/experience of problem management tools
  • good all-round technical skills
  • ability to effectively share information with other support team members and with other technology teams
Job Responsibility
Job Responsibility
  • provides technical and business support for users of Citi Applications
  • maintains application systems that have completed development stage and are running in daily operations
  • manages, maintains and supports applications and their operating environments, focusing on stability, quality and functionality
  • start of day checks, continuous monitoring, and regional handover
  • perform same day risk reconciliations
  • develop and maintain technical support documentation
  • identifies ways to maximize potential of applications used
  • assess risk and impact of production issues and escalate to business and technology management
  • ensures storage and archiving procedures are in place and functioning correctly
  • formulates and defines scope and objectives for complex application enhancements and problem resolution
What we offer
What we offer
  • rewarding work in a supportive environment
  • clear opportunities for progression
  • exciting company benefits
  • diverse team of professionals
  • global network of people, data and relationships
  • Fulltime
Read More
Arrow Right