DevOps / Site Reliability Engineer Job at Solas IT Recruitment (Dublin)

Senior Site Reliability Engineer

Baxter International is seeking a skilled Senior Principal Site Reliability Engi...

Location

United States , Deerfield

Salary:

96000.00 - 132000.00 USD / Year

Baxter

Expiration Date

Until further notice

Requirements

Bachelor's degree in computer science, IT, or related field (or equivalent experience)
Prior experience in Site Reliability Engineering and cloud-based infrastructure management
Experience in enterprise engineering, including 24x7 uptime, regulated environments, and planning/operations
Azure administration and operations experience, with certifications a plus
Knowledge of related technologies, including cloud, encryption, and security protocols
Systems administration experience in Windows and Linux environments
Proven problem-solving skills and experience with scripting and automation tools
Ability to create accurate documentation and reports, with excellent communication skills

Job Responsibility

Drive strategies to ensure 24x7 availability of services and business continuity for customer facing healthcare software applications and platforms hosted on Microsoft Azure cloud
Manage and administer Azure resources, including virtual machines, databases, and networking components
Define and document operating procedures to ensure required security, privacy and other compliance standards are maintained for digital solutions deployed in cloud
Manage process, planning, and execution for Disaster Recovery (DR) and Business Continuity Planning (BCP)
Define and refine Operations SLAs to maintain high level of Customer Satisfaction
Establish non-functional requirements to meet SLAs
Establish infrastructure and application monitoring dashboards and workflow for automatic routing of notifications
Define key performance indicators that can be monitored, measured, and used to derive opportunities
Standardize site metrics for stakeholders, reporting on various KPIs including SLAs, availability, capacity utilization, service metrics and cost utilization
Work closely with DevOps Engineers to automate infrastructure provisioning and deployment processes

What we offer

Healthcare benefits
Employee Stock Purchase Plan (ESPP)
401(k) Retirement Savings Plan
Flexible Spending Accounts
Educational assistance programs
Paid holidays
Paid time off
Paid parental leave
Commuting benefits
Employee Discount Program

Fulltime

Senior Site Reliability Engineer

This is a role at Baxter where your work impacts saving and sustaining lives thr...

Location

United States , Deerfield

Salary:

96000.00 - 132000.00 USD / Year

Baxter

Expiration Date

Until further notice

Requirements

Bachelor's degree in computer science, IT, or related field (or equivalent experience)
Prior experience in Site Reliability Engineering and cloud-based infrastructure management
Experience in enterprise engineering, including 24x7 uptime, regulated environments, and planning/operations
Azure administration and operations experience, with certifications a plus
Knowledge of related technologies, including cloud, encryption, and security protocols
Systems administration experience in Windows and Linux environments
Proven problem-solving skills and experience with scripting and automation tools
Ability to create accurate documentation and reports, with excellent communication skills
Applicants must be authorized to work for any employer in the U.S.
Unable to sponsor or take over sponsorship of an employment visa at this time.

Job Responsibility

Drive strategies to ensure 24x7 availability of services and business continuity for customer-facing healthcare software applications and platforms hosted on Microsoft Azure cloud
Manage and administer Azure resources, including virtual machines, databases, and networking components
Define and document operating procedures to ensure required security, privacy and other compliance standards are maintained for digital solutions deployed in cloud
Manage process, planning, and execution for Disaster Recovery (DR) and Business Continuity Planning (BCP)
Define and refine Operations SLAs to maintain high level of Customer Satisfaction
Establish non-functional requirements to meet SLAs
Establish infrastructure and application monitoring dashboards and workflow for automatic routing of notifications
Define key performance indicators that can be monitored, measured, and used to derive opportunities
Standardize site metrics for stakeholders, reporting on various KPIs including SLAs, availability, capacity utilization, service metrics and cost utilization
Work closely with DevOps Engineers to automate infrastructure provisioning and deployment processes.

What we offer

Support for Parents
Continuing Education/Professional Development
Employee Health & Well-Being Benefits
Paid Time Off
2 Days a Year to Volunteer
Medical and dental coverage starting day one
Insurance coverage for basic life, accident, short-term and long-term disability
Business travel accident insurance
Employee Stock Purchase Plan (ESPP)
401(k) Retirement Savings Plan

Fulltime

Site Reliability Engineer/ Sr DevOps

We are offering a contract to permanent employment opportunity for a Site Reliab...

Location

United States , Woodland Hills

Salary:

Not provided

Robert Half

Expiration Date

Until further notice

Requirements

Minimum of 5 years experience in a similar role
Proven expertise in Amazon EC2
Experience with Ansible for configuration management
Knowledge of Apache ANT+ and Apache Tomcat
Familiarity with Atlassian Jira for project management
Experience in AB Testing methodologies
Proficiency in Agile Scrum methodologies
Demonstrated skills in automation processes
Comprehensive understanding of AWS Technologies
Ability to perform Cluster Analysis

Job Responsibility

Architect and design applications for migration, ensuring they align with compliance standards and best practices
Actively participate in building solutions and gain an acute understanding of core infrastructure services and their interaction with applications
Provide technical leadership to offshore teams, aiding in the distribution of leadership tasks
Execute scripting tasks within a C# and .NET environment
Understand and manage the interaction between Service Bus, messaging queues, and other applications, and their subsequent impact on infrastructure
Work extensively with Azure and AWS ecosystems
Ensure the smooth functioning of applications by understanding the intricacies of infrastructure services
Utilize AWS and Azure expertise in scripting within C# and .NET environment
Handle the interaction of Service Bus or messaging queues with other applications and its impact on infrastructure
Engage in hands-on work to build solutions while understanding the interaction of core infrastructure services with applications

What we offer

Medical, vision, dental, and life and disability insurance
Eligibility to enroll in company 401(k) plan

Fulltime

Site Reliability Engineer 2

Join us. At PagerDuty, you'll tackle complex problems, collaborate with kind and...

Location

Portugal , Lisbon

Salary:

Not provided

PagerDuty

Expiration Date

Until further notice

Requirements

3+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles
Experience with Kubernetes and container orchestration
Experience working on cloud-native infrastructure (e.g. AWS, GCP, Azure)
Proficiency in at least one programming language (e.g. Python, Ruby, Go, etc.)
Experience with Infrastructure as Code, (e.g. Terraform, Cloudformation)

Job Responsibility

Deploy, configure, monitor and optimize highly available Kubernetes clusters on AWS/EKS
Help maintain the overall health of the platform, including triaging and troubleshooting production issues, monitoring system capacity, and working with other technical teams to ensure adherence to compliance and security best practices
Continuously strive to improve the internal developer experience and the software development lifecycle
Stay current on technical trends to suggest innovative tools and approaches to interesting problems
Participate in a 24/7 on-call rotation

What we offer

Competitive salary
Comprehensive benefits package from day one
Flexible work arrangements
Company equity
ESPP (Employee Stock Purchase Program)
Retirement or pension plan
Generous paid vacation time
Paid holidays and sick leave
Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent

Fulltime

Staff Site Reliability Engineer

At Ledger, we are looking for an experienced Reliability Engineer to join our SR...

Location

France , Paris

Salary:

Not provided

Ledger

Expiration Date

Until further notice

Requirements

8+ years on cloud engineering at scale, on organizations operating SaaS solutions
Proficiency in working in Unix/Linux environments, Git, Python, Terraform, Kubernetes, AWS cloud solutions and architectures, CI/CD tools, Argocd, Ansible, configuration management, etc.
Strong knowledge on observability practices, with experience implementing and managing Logging, Monitoring and Alerting framework with solutions such as Datadog or Prometheus/Grafana/Loki.
Experience of cross-functional work and the ability to demonstrate a collaborative approach with regards to building key relationships across the organization and define projects scope, goals, plan and deliverables
Customer focused with the ability to identify and understand both internal and external customer's needs
Creative problem-solving and analysis skills with an ability to identify, develop, and implement solutions to meet the needs of the business
Excellent presentation and written communication
Ability to deal with ambiguity, high level of pressure and rapidly changing environments
Engineering degree.

Job Responsibility

Participate in building a DevOps / SRE culture and enable the transition to modern infrastructure management and deployment practices
Participate in building the SRE team roadmap (vision and delivery accountability). Anticipate stakeholder needs, game-changing technologies emergence and challenge scope / deadlines
Perform integration of platform software components
Participate to design and deliver solutions to improve the availability, scalability, latency, and efficiency of systems
Influence and create standards & best practices in support of service level objectives
Automate key SRE metrics including SLOs/SLAs and error budgets
Provide expert support to our level-2/application support team, to troubleshoot priority incidents, and conduct post-mortems
Apply analytics on past incidents and usage patterns to predict issues and take proactive actions
Ensure control of technical debt and promote quality practices
Follow SRE and chaos engineering approaches across all strategic systems to predict in coordination with Service Design and prevent outages and improve solution availability

What we offer

Equity: Employees are the foundation of our success, and we award stock options so you can share in that success as we grow
Flexibility: A hybrid work policy
Social: Annual company outing for Ledgerdary Days, plus frequent social events, snacks and drinks
Medical: Comprehensive health insurance policy offering extensive medical, dental and vision care coverage
Well-being: Personal development, coaching & fitness with our dedicated partners
Vacation: Five weeks of paid leave per year, in addition to national holidays and rest & relaxation (RTT) days
High tech: Access to high performance office equipment and gadgets, including Apple products
Transport: Ledger reimburses part of your preferred means of transportation
Discounts: Employee discount on all our products.

Fulltime

Site Reliability Engineering Manager

Hewlett Packard Enterprise (HPE) is looking for a Site Reliability Engineering M...

Location

India , Bangalore

Salary:

Not provided

Hewlett Packard Enterprise

Expiration Date

Until further notice

Requirements

7–10 years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles
Minimum 2 years of experience managing or leading cloud operations teams
Deep understanding of cloud platforms (AWS, GCP, or Azure) and cloud-native architectures
Hands-on experience with Kubernetes, containers, infrastructure as code (e.g., Terraform), and configuration management tools
Strong foundation in observability (monitoring, logging, tracing), automation using Python, and incident response
Familiarity with modern CI/CD automation and tools
Excellent communication, stakeholder management, and team-building skills
Experience scaling SRE practices in high-growth or large-scale environments
Ability to balance long-term reliability initiatives with short-term delivery needs.

Job Responsibility

Lead and mentor a team of Site Reliability Engineers, supporting their growth, performance, and well-being
Own the reliability strategy for SASE cloud infrastructure systems, including incident management, SLIs/SLOs, and capacity planning
Partner with Engineering, Product, and Security teams to design and deliver highly available, scalable, and resilient cloud-native services
Guide the team in building automation, improving observability, and improve operational efficiency of our cloud infrastructure
Drive adoption of best practices in monitoring, alerting, on-call operations, and runbook development
Build and maintain a strong engineering culture based on ownership, collaboration, and continuous learning
Define and track key reliability metrics, and report on team performance and system health to leadership
Contribute to hiring, onboarding, and career development for SREs.

What we offer

Health & Wellbeing benefits for physical, financial, and emotional wellbeing
Personal & Professional Development programs
Unconditional inclusion in the workplace.

Fulltime

Site Reliability Engineer

Corporate Tools is looking for a Site Reliability Engineer. You will be a tradit...

Location

United States

Salary:

175000.00 USD / Year

Corporate Tools

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, Software Engineering, or equivalent practical experience
5+ years of experience in software engineering
2+ years of experience in site reliability engineering, DevOps, or infrastructure engineering roles
Deep experience with cloud platforms (AWS, Azure, or GCP) and infrastructure as code tools such as Terraform, CloudFormation, or Pulumi
Strong proficiency with Kubernetes, Docker, and container orchestration in production environments
Hands-on experience with observability and monitoring tools like Prometheus, Grafana, OpenTelemetry, Sentry, or New Relic
Proven ability to design and implement highly available, fault-tolerant systems and lead proactive incident response efforts
Experience with performance tuning, database optimization, and caching strategies (e.g., PostgreSQL, Redis, Memcached)
Demonstrated ability to drive reliability improvements, reduce operational toil, and foster a culture of resilience and continuous improvement
Experience leading reliability-focused initiatives such as post-incident reviews, capacity planning, and root cause analysis

Job Responsibility

Stop problems before they start
Fix issues quickly and learn from them
Help keep systems steady, secure, and running
Work closely with DevOps engineers to build out tools and automation
Take ownership

What we offer

100% employer-paid medical, dental and vision for employees
Annual review with raise option
22 days Paid Time Off accrued annually, and 4 holidays
After 3 years, PTO increases to 29 days
Employees transition to flexible time off after 5 years with the company—not accrued, not capped, take time off when you want
Paid Parental Leave
Up to 6% company matching 401(k) with no vesting period
Quarterly allowance
Open concept office with friendly coworkers
Creative environment where you can make a difference

Fulltime

Software Engineer, Site Reliability

As a Site Reliability Engineer (SRE) at Fireworks AI, you will play a critical r...

Location

United States , San Mateo

Salary:

Not provided

Fireworks AI

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, related technical field, or equivalent practical experience
5+ years of experience in Site Reliability Engineering, DevOps, or a similar role focused on large-scale production systems
Deep expertise in SRE principles and practices, including SLOs, SLIs, operational automation, incident management, and post-mortems
Extensive hands-on experience with public cloud platforms (AWS, GCP, Azure), including compute, networking, storage, and database services
Strong experience with containerization technologies (Docker) and orchestration platforms (Kubernetes)
Proficiency in designing and implementing robust monitoring, logging, and alerting systems using tools like Prometheus, Grafana, ELK stack, and distributed tracing
Solid programming/scripting skills in at least one language (e.g., Python, Go) for automation and tool development
In-depth knowledge of Linux operating systems, networking fundamentals, and system debugging
Proven ability to troubleshoot complex issues across the entire stack
Excellent communication, collaboration, and problem-solving skills

Job Responsibility

Ensuring System Reliability: Ensure systems are designed and implemented with high availability, scalability, and performance. Focus on fault tolerance, disaster recovery, identifying and removing scaling bottlenecks, and performance optimization across our multi-cloud infrastructure
Incident Management & Response: Lead efforts in incident detection, response, and resolution for critical production issues. Drive post-mortems to identify root causes and implement preventative measures to improve system reliability
Observability & Monitoring: Develop, implement, and maintain comprehensive monitoring, alerting, logging, and tracing solutions to provide deep insights into system health and performance
Automation & Toil Reduction: Identify and automate repetitive operational tasks to reduce toil and improve operational efficiency. Develop tools and scripts to streamline deployments, scaling, and system management
Capacity Planning & Performance Tuning: Work proactively on capacity planning to ensure our infrastructure can gracefully handle growth and peak loads. Optimize system performance and resource utilization
Reliability Best Practices: Collaborate with software engineers to embed reliability principles (e.g., SLOs, SLIs, error budgets) into the development lifecycle, promoting a culture of operational excellence
On-call Rotation: Participate in a periodic on-call rotation to support our production environment and respond to critical alerts

Fulltime

DevOps / Site Reliability Engineer

Solas IT Recruitment

Location:
Ireland , Dublin

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Additional Information:

Job Posted:
February 17, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for DevOps / Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Site Reliability Engineer/ Sr DevOps

Site Reliability Engineer 2

Staff Site Reliability Engineer

Site Reliability Engineering Manager

Site Reliability Engineer

Software Engineer, Site Reliability

DevOps / Site Reliability Engineer

Solas IT Recruitment

Location:Ireland , Dublin

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Additional Information:

Job Posted:February 17, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for DevOps / Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Site Reliability Engineer/ Sr DevOps

Site Reliability Engineer 2

Staff Site Reliability Engineer

Site Reliability Engineering Manager

Site Reliability Engineer

Software Engineer, Site Reliability

Location:
Ireland , Dublin

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
February 17, 2026