CrawlJobs Logo

Site Reliability Engineer II

genpt.com Logo

Genuine Parts Company

Location Icon

Location:
United States , Birmingham

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Under general supervision, the Site Reliability Systems Administrator II is responsible for improving system reliability and resilience. This role focuses on building automation to reduce manual effort and prevent service-impacting incidents. The SRE combines software and systems engineering to build and support large-scale, distributed, fault-tolerant systems. This role ensures that critical platforms are available, reliable, and able to support a fast rate of improvement. This role relies on monitoring platforms and is continually taking a holistic view of system health and performance. The SRE will enhance and support cloud-based transformations and is focused on pushing capabilities forward, staying ahead of customer needs, and innovating for continuous improvement. The SRE provides operational support and engineering for multiple large-scale distributed software applications.

Job Responsibility:

  • Defines, designs, and administers network systems used for data communications and recommends improvements to problems of moderate scope
  • Responsible for making sure that the company network works
  • Manages the load configuration of a central data communication processor under limited guidance and makes some recommendations for the purchase or upgrade of data networks
  • Exercises some discretion in proposing and implementing network system enhancements (software and hardware updates)
  • Serves as a point of contact for performance analysis, scalability, and service architecture/database administration issues
  • Coordinates equipment orders including terminals and cable installation, as well as upgrading, monitoring, testing, and servicing the database/systems
  • Helps to negotiate and place orders with common carriers
  • Performs other duties as assigned

Requirements:

  • Bachelor's degree
  • Three (3) to five (5) years of related experience or an equivalent combination
  • Intermediate knowledge of appropriate networks, products, and protocols
  • Knowledge of Unix, Windows NT/2000/98, Internet Security, Oracle ERP, Distributed computing systems
  • Knowledge of job associated database/software/documentation/programming languages/monitoring and version control tools
  • Troubleshooting skills
  • Problem solving skills
  • Demonstrated knowledge and adherence to Change Management processes
  • Ability to interface well with customers, end users, partners, and associates
What we offer:
  • Healthcare coverage
  • 401(k)
  • Tuition reimbursement
  • Vacation
  • Sick pay
  • Holiday pay

Additional Information:

Job Posted:
December 25, 2025

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Site Reliability Engineer II

Senior Security Operations Engineer II

As a Senior Security Operations Engineer, you’ll play a key role in ensuring the...
Location
Location
United States , Scottsdale
Salary
Salary:
Not provided
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in operations, site reliability, or infrastructure engineering roles
  • Strong experience securing and managing cloud environments (e.g., AWS, Azure) and containerized workloads
  • Deep understanding of Linux systems, networking, distributed systems, and their associated security controls
  • Proficiency in automation, scripting, and security tooling integration to streamline operations and enforcement
  • Experience with security monitoring, alerting, SIEM platforms, and observability tools
  • Solid grasp of CI/CD practices with integrated security testing and compliance checks
  • Experience managing Kubernetes clusters and running containerized workloads in production
  • Experience with deploying and administrating any of the following: scalable cloud native secrets solutions such as AWS KMS, Azure KeyVault
  • PKI solutions such as EJBCA, Smallstep, Venafi
  • or vaulting solutions such as Hashicorp Vault
Job Responsibility
Job Responsibility
  • Implementing and improving automated security checks in CI/CD pipelines to prevent vulnerabilities from reaching production
  • Writing, reviewing, and maintaining security-focused infrastructure-as-code for scalable and compliant deployments
  • Investigating security incidents, performing root cause analysis, and implementing long-term mitigation strategies
  • Collaborating with developers to develop new features, services, and infrastructure requirements
  • Enhancing security observability through improved log collection, metrics, and alerting configurations
  • Maintaining and improving security runbooks, incident response playbooks, and internal security tooling for operational efficiency
  • Resolve security/infrastructure incidents by participating in high impact/high visibility incidents as a participant and ideally as an incident commander
  • Maintain and secure critical infrastructure components such as PKI (Public Key Infrastructure) and IAM ( Identity & Access Management) systems, ensuring reliability, scalability, and compliance with organizational and industry security standards
  • Build and maintain secure, reliable, and scalable infrastructure that protects core services and sensitive data
  • Troubleshoot and resolve complex operational and system-level issues across environments
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary paid time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Mental Wellness support
  • Learning & Development programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right

Senior Security Operations Engineer II

As a Senior Security Operations Engineer, you’ll play a key role in ensuring the...
Location
Location
United States , Scottsdale
Salary
Salary:
Not provided
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in operations, site reliability, or infrastructure engineering roles
  • Strong experience securing and managing cloud environments (e.g., AWS, Azure) and containerized workloads
  • Deep understanding of Linux systems, networking, distributed systems, and their associated security controls
  • Proficiency in automation, scripting, and security tooling integration to streamline operations and enforcement
  • Experience with security monitoring, alerting, SIEM platforms, and observability tools
  • Solid grasp of CI/CD practices with integrated security testing and compliance checks
  • Experience managing Kubernetes clusters and running containerized workloads in production
  • Experience with deploying and administrating any of the following: scalable cloud native secrets solutions such as AWS KMS, Azure KeyVault
  • PKI solutions such as EJBCA, Smallstep, Venafi
  • or vaulting solutions such as Hashicorp Vault
Job Responsibility
Job Responsibility
  • Implementing and improving automated security checks in CI/CD pipelines to prevent vulnerabilities from reaching production
  • Writing, reviewing, and maintaining security-focused infrastructure-as-code for scalable and compliant deployments
  • Investigating security incidents, performing root cause analysis, and implementing long-term mitigation strategies
  • Collaborating with developers to develop new features, services, and infrastructure requirements
  • Enhancing security observability through improved log collection, metrics, and alerting configurations
  • Maintaining and improving security runbooks, incident response playbooks, and internal security tooling for operational efficiency
  • Resolve security/infrastructure incidents by participating in high impact/high visibility incidents as a participant and ideally as an incident commander
  • Maintain and secure critical infrastructure components such as PKI (Public Key Infrastructure) and IAM ( Identity & Access Management) systems, ensuring reliability, scalability, and compliance with organizational and industry security standards
  • Build and maintain secure, reliable, and scalable infrastructure that protects core services and sensitive data
  • Troubleshoot and resolve complex operational and system-level issues across environments
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary paid time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Mental Wellness support
  • Learning & Development programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right
New

Site Reliability Engineer

At Boeing, we innovate and collaborate to make the world a better place. We’re c...
Location
Location
Canada , Richmond
Salary
Salary:
103000.00 - 184000.00 CAD / Year
boeing.com Logo
Boeing
Expiration Date
February 24, 2026
Flip Icon
Requirements
Requirements
  • 7+ years in software development or advanced technical support role
  • 5+ years of experience in site reliability engineering, DevOps, or a related role
  • Proven experience in site reliability engineering, DevOps, or a related role, with a track record of successfully implementing and managing infrastructure and deployment pipelines
  • Candidate must be eligible for authorization under the Canadian Government Controlled Goods Program (CGP) assessment
  • Must be able to obtain Canadian Secret Level II Security Clearance
  • Must be legally able to work in Canada
  • Individuals must not pose a risk for safeguarding of controlled goods
  • Must be eligible to handle US export-controlled data
  • Fluency in English language
Job Responsibility
Job Responsibility
  • Design, build, and maintain scalable and highly available infrastructure and processes using modern DevOps practices
  • Deploy and support customer installations, ensuring a smooth setup and integration of our hybrid multi-tenant SaaS solutions into their environments
  • Provide both reactive and proactive support to customers, addressing issues as they arise and implementing strategies to prevent future incidents
  • Lead incident response efforts, perform root cause analysis, and implement preventive measures to minimize downtime and service disruptions
  • Develop and enhance automation tools and scripts to streamline operations, reduce manual intervention, and improve efficiency
  • Set up and manage monitoring and alerting systems to proactively identify and resolve performance issues
  • Analyze system capacity and performance metrics to forecast future needs and ensure scalability of services
  • Collaborate with cross-functional teams to identify and implement new tools, technologies, and processes to enhance DevOps practices
  • Implement and advocate for “security best practices” to protect our applications and customer data
  • Pioneer and support special projects
What we offer
What we offer
  • Competitive base pay and incentive programs
  • Industry-leading tuition assistance program pays your institution directly
  • Resources and opportunities to grow your career
  • Up to $10,000 match when you support your favorite nonprofit organizations
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

Site Reliability Engineer II - (Microsoft 365 Enterprise + Cloud). We are lookin...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Mid-level years of software development: automation-related experience is most valued
  • Scripting languages such as bash, python, and PowerShell, or compiled languages such as C, C# are most relevant, but others are acceptable
  • Awareness of, and ability to reason about, modern software & systems architectures, including load-balancing, queueing, caching, distributed systems failure modes, microservices, and so on
  • Associated troubleshooting skills, including the ability to follow RPC (Remote Procedure Call) call-chains across arbitrary network steps
  • Consequent understanding of monitoring in distributed systems
  • Deep understanding of operating system level concepts such as processes, memory allocation, and the network stack
  • understanding of how applications are affected by the above, and ability to debug same
  • Experience with working in a team, including coordinating large projects, communicating well, and exercising initiative when presented with problems
  • Practical experience running large scale online systems is always an advantage
Job Responsibility
Job Responsibility
  • Researches and maintains deep knowledge of industry trends as well as advances in large-scale distributed systems and cloud technologies
  • identifies opportunities to create, implement, and/or optimally utilize new tools, technologies, and/or processes to solve ambiguous problems and improve product availability, reliability, efficiency, observability, and/or performance
  • Drives the adoption of innovative solutions across engineering teams working with related products within an organization
  • Apply advanced statistical and machine learning techniques to analyze large datasets and extract meaningful insights
  • Experience working with all service aspects of high throughput and multi-tenant services, ability to understand and design workflows carefully, properly handle errors, write clean and well-factored code with good tests and good maintainability
  • Engages with product engineering teams by partaking in code/design reviews, participating in on-call rotations and incident responses throughout product development and operations cycles
  • leverages end-to-end technical expertise on underlying systems/platforms and insights from engagements with product engineering teams and telemetry analyses to propose scalable improvements in code and designs with attention to customer/business objectives and incident prevention
  • Develops code, scripts, systems, or platforms that automate moderately complex but repetitive operations processes (e.g., monitoring, alerting, deploying products and updates, debugging) at scale
  • reviews existing automation code and scripts to evaluate reusability, extendibility, and scalability within an organization
  • Analyzes data from telemetry pipelines and monitoring tools that detail operations metrics (e.g., availability, reliability, performance, efficiency) of systems, platforms, or products operating at scale
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

We are the Data Center Network Services team within Cisco IT, supporting network...
Location
Location
United States of America , Research Triangle Park, North Carolina
Salary
Salary:
109900.00 - 200100.00 USD / Year
duo.com Logo
Duo Security
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Engineering or Technology, with 0- 3 years of experience in building, testing, or deploying scalable network applications
  • Strong programming skills, with expertise in Python and Ansible scripting
  • Hands-on experience with tools such as JIRA, Git, and Jenkins
  • Proficiency with Continuous Integration/Continuous Deployment (CI/CD) and pipeline setup
  • Solid understanding of software engineering concepts: data structures, algorithms, object-oriented programming, distributed systems, and cloud computing
Job Responsibility
Job Responsibility
  • Design, develop, test, and deploy new software capabilities for Data Center Networks
  • Collaborate with engineers across multiple disciplines and engage with internal clients
  • Deliver innovative, high-quality solutions that enhance the client experience
What we offer
What we offer
  • Medical, dental and vision insurance
  • 401(k) plan with a Cisco matching contribution
  • Paid parental leave
  • Short and long-term disability coverage
  • Basic life insurance
  • 10 paid holidays per full calendar year, plus 1 floating holiday for non-exempt employees
  • 1 paid day off for employee’s birthday, paid year-end holiday shutdown, and 4 paid days off for personal wellness
  • Non-exempt employees receive 16 days of paid vacation time per full calendar year
  • Exempt employees participate in Cisco’s flexible vacation time off program
  • 80 hours of sick time off provided on hire date and each January 1st thereafter
  • Fulltime
Read More
Arrow Right

Software Engineer II - CoreAI

As an AI Engineer on the CoreAI Platform team, you will apply artificial intelli...
Location
Location
United States , Redmond
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Design, build, and scale AI models to detect anomalies, identify regressions across large-scale AI systems
  • Analyze patterns in telemetry, logs, and real‑time signals to uncover root causes, predict failures, and drive proactive mitigations
  • Apply AI to identify emerging usage trends, performance hotspots, and workload irregularities that impact system health and user experience
  • Build lightweight automation that leverages anomaly detection signals and pattern analysis to improve live‑site reliability and engineering velocity
  • Contribute to hotfixes, performance tuning, and reliability improvements in production AI engines (e.g., GPU savings, SLA reliability, customer satisfaction)
  • Build intuitive, responsive UI components for AI dashboards and telemetry tools using React and modern web technologies
  • Communicate technical concepts with clarity and initiative, proactively seeking feedback and driving continuous improvement
  • Stay current with industry trends in applied AI, observability, and performance engineering
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

Fivetran is looking for a high-performance engineer to be a part of a team of Si...
Location
Location
United States , Denver
Salary
Salary:
120507.78 - 144615.12 USD / Year
fivetran.com Logo
Fivetran
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Knowledge of Cloud Platforms and related tooling: AWS, GCP, Azure, Terraform, configuration management
  • Experience in a scripting language
  • A strong foundation in Linux operating system internals and administration
  • Knowledge of Kubernetes
  • Familiarity with a relational database
Job Responsibility
Job Responsibility
  • Responsible for monitoring the availability, capacity, and throughput of Fivetran's production infrastructure to identify and address potential issues
  • Collaborate with engineering teams to integrate reliability best practices into the product roadmap
  • Support the prioritization and resolution of critical bugs identified by support or sales
  • Contribute to maintaining 100% availability of production infrastructure by collaborating with engineering to implement automation for scalable deployments
  • Proactively monitor infrastructure vulnerabilities and collaborate with the security team to address them in a timely manner
What we offer
What we offer
  • 100% employer-paid medical insurance
  • Generous paid time-off policy (PTO), plus paid sick time, inclusive parental leave policy, holidays, and volunteer days off
  • RSU stock grants
  • Professional development and training opportunities
  • Company virtual happy hours, free food, and fun team-building activities
  • Monthly cell phone stipend
  • Access to an innovative mental health support platform that offers personalized care and resources in areas such as: therapy, coaching, and self-guided mindfulness exercises for all covered employees and their covered dependents
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

Fivetran is looking for a high-performance engineer to be a part of a team of Si...
Location
Location
United States , Oakland
Salary
Salary:
133897.53 - 160683.46 USD / Year
fivetran.com Logo
Fivetran
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Knowledge of Cloud Platforms and related tooling: AWS, GCP, Azure, Terraform, configuration management
  • Experience in a scripting language
  • A strong foundation in Linux operating system internals and administration
  • Knowledge of Kubernetes
  • Familiarity with a relational database
Job Responsibility
Job Responsibility
  • Responsible for monitoring the availability, capacity, and throughput of Fivetran's production infrastructure to identify and address potential issues
  • Collaborate with engineering teams to integrate reliability best practices into the product roadmap
  • Support the prioritization and resolution of critical bugs identified by support or sales
  • Contribute to maintaining 100% availability of production infrastructure by collaborating with engineering to implement automation for scalable deployments
  • Proactively monitor infrastructure vulnerabilities and collaborate with the security team to address them in a timely manner
What we offer
What we offer
  • 100% employer-paid medical insurance
  • Generous paid time-off policy (PTO), plus paid sick time, inclusive parental leave policy, holidays, and volunteer days off
  • RSU stock grants
  • Professional development and training opportunities
  • Company virtual happy hours, free food, and fun team-building activities
  • Monthly cell phone stipend
  • Access to an innovative mental health support platform that offers personalized care and resources in areas such as: therapy, coaching, and self-guided mindfulness exercises for all covered employees and their covered dependents
  • Fulltime
Read More
Arrow Right