CrawlJobs Logo

Lead SRE

https://www.inetum.com Logo

Inetum

Location Icon

Location:
Portugal , Lisbon

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are looking for a Lead SRE to join our Inetum Team and be part of a work culture focused on innovation!

Job Responsibility:

  • Train SREs and their managers on SRE practices
  • Co-construct the transformation strategy and the support plan by participating in workshops, brainstorming with the transformation team and producing training content
  • Coach and support

Requirements:

  • SRE IT production processes
  • Agile / DevOps Mindset Problem Solving
  • Scripting: Python, YML, Shell
  • Monitoring: Dynatrace, Nagios
  • Linux
  • Admin Network (DNS, Firewall, Switch)
  • DevOps stack: Git & Git Flow, Artifactory, Jenkins or Gitlab CI, Ansible Tower, Digital ai Release
  • Cloud: Kubernetes, Docker, Argo CD, ArgoCD, Vault, Helm
  • End-to-end IT organization and processes (from development to run / operate)
  • Technical Architecture

Additional Information:

Job Posted:
July 14, 2025

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Lead SRE

Internal Kubernetes Platform Lead SRE

HSBC is seeking an IKP Support Engineer (SRE) to join the IKP Team within the Hy...
Location
Location
Poland
Salary
Salary:
Not provided
https://www.hsbc.com Logo
HSBC
Expiration Date
February 17, 2026
Flip Icon
Requirements
Requirements
  • Solid technical knowledge and experience with Kubernetes administration
  • 3+ years of hands-on experience with Kubernetes administration
  • Strong knowledge of Kubernetes concepts and operations and troubleshooting tools
  • Understanding of containerization and orchestration
  • Experience with Unix administration skills
  • Experience with Service Meshes is a plus
  • Understanding of ITIL processes and automation skills
  • Familiarity with infrastructure as a code
  • Strong analytical and communication skills
  • Proficiency in English.
Job Responsibility
Job Responsibility
  • Ensure the reliability, availability, and performance of the infrastructure platform
  • Collaborate in diagnosing and resolving IKP infrastructure issues
  • Support the deployment, configuration, and maintenance of Kubernetes platform
  • Troubleshoot and resolve incidents, performance issues, and integration failures
  • Perform root cause analysis and implement reliability improvements
  • Provide 24x7 support as part of an on-call Rota
  • Plan duties and the other administrative tasks for a team in line with Polish Labor Code.
What we offer
What we offer
  • Competitive salary
  • Annual performance-based bonus
  • Additional bonuses for recognition awards
  • Multisport card
  • Private medical care
  • Life insurance
  • One-time reimbursement of home office set-up (up to 800 PLN)
  • Corporate parties & events
  • CSR initiatives
  • Nursery discounts
  • Fulltime
!
Read More
Arrow Right

Site Reliability Engineering Support Lead

Site Reliability Engineering Support Lead role focused on application support, d...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Solid SRE process experience
  • 5+ years of Leading high-performance, 24x7, DevOps or SysOps team
  • Proficiency in Windows administration, Office 365, Exchange, SharePoint, Active Directory, Backup, Networking and Infrastructure
  • Experience with Microsoft OS Windows & Server
  • Experience in ticket tracking and resolving on time
  • Hands-on experience on ticketing tools (ServiceNow)
  • Excellent verbal, written, presentation and interpersonal communication skills
  • Ability to make complex technical matters easy-to-comprehend for non-technical persons.
Job Responsibility
Job Responsibility
  • Taking end-to-end Ownership of Application Support for Production Systems Issues resolution
  • Implementing, monitoring, and maintaining CI/CD frameworks
  • Developing new capabilities, coordinating implementation across a large number of teams including infrastructure, developer tools and information security
  • Influencing a culture of Site Reliability Engineering. Engaging in training and mentoring to help develop other engineers with SRE mind set
  • Providing the first line of after-deployment technical support at L1 and L2 level for applications and and/or associated production systems diagnostics, and network health monitoring
  • Coordination and/or for deploying hands-on fixes, patches and software updates at the application level, and as appropriate at the network level
  • Managing a team of technical support engineers who provide technical support to users
  • Escalating complex problems to the L3 level of expertise within organization, along with observations from investigative and diagnostic assessments
  • Co-ordinating in the investigation of repeated technical issues affecting user system and seeing through to resolution
  • Escalating, resolving, guiding team, and tracking production incidents to closure
What we offer
What we offer
  • Competitive base salary (which is annually reviewed)
  • Hybrid working model (up to 2 days working at home per week)
  • Additional benefits to support you and your family to be well, live well and save well.
  • Fulltime
Read More
Arrow Right

Lead Site Reliability Engineer

Groupon is a marketplace where customers discover new experiences and services e...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
groupon.com Logo
Groupon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years in systems engineering
  • at least 5+ years in SRE or DevOps roles
  • expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, Docker)
  • proficiency in programming and scripting languages like Python, Go, and Bash
  • advanced knowledge of Infrastructure as Code (IaC) tools such as Terraform and Ansible
  • deep understanding of networking, DNS, load balancing, and security principles
  • proven track record of managing high-availability systems in demanding environments
  • exceptional analytical and problem-solving skills
Job Responsibility
Job Responsibility
  • Architect and maintain fault-tolerant systems, ensuring uptime SLAs of 99.9% or higher
  • drive automation in infrastructure management and deployment using Terraform, Ansible, Kubernetes, and similar tools
  • create and optimize CI/CD pipelines to ensure reliable, secure, and efficient software delivery
  • build and enhance comprehensive observability solutions, including monitoring, logging, and alerting systems using Prometheus, Grafana, and the ELK stack
  • collaborate with stakeholders to define and achieve SLIs, SLOs, and error budgets aligned with business needs
  • lead incident response during on-call rotations, ensuring rapid resolution and root cause analysis for critical issues
  • design and execute performance testing, capacity planning, and scalability strategies for evolving workloads
  • proactively identify and resolve bottlenecks, increasing system performance and developer efficiency
  • mentor junior engineers, fostering a collaborative and growth-oriented team environment
  • guide architectural decisions that drive innovation and enhance system reliability
What we offer
What we offer
  • The opportunity to work with cutting-edge technologies in a transformative environment
  • a collaborative and innovative work values alignment that values your expertise and contributions
  • professional growth and leadership development pathways tailored to your aspirations
  • a chance to leave a lasting impact by shaping the future of reliable and scalable systems
Read More
Arrow Right

Engineering Lead Analyst

The Engineering Lead Analyst is a senior level position responsible for leading ...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6-10 years of relevant experience in an Engineering role
  • Experience working in Financial Services or a large complex and/or global environment
  • Project Management experience
  • Consistently demonstrates clear and concise written and verbal communication
  • Comprehensive knowledge of design metrics, analytics tools, benchmarking activities and related reporting to identify best practices
  • Demonstrated analytic/diagnostic skills
  • Ability to work in a matrix environment and partner with virtual teams
  • Ability to work independently, multi-task, and take ownership of various parts of a project or initiative
  • Ability to work under pressure and manage to tight deadlines or unexpected changes in expectations or requirements
  • Proven track record of operational process change and improvement
Job Responsibility
Job Responsibility
  • Serve as a technology subject matter expert for internal and external stakeholders
  • Provide direction for all firm mandated controls and compliance initiatives
  • Lead projects within the group and create a technology domain roadmap
  • Ensure that all integration of functions meet business goals
  • Define necessary system enhancements to deploy new products and process enhancements
  • Recommend product customization for system integration
  • Identify problem causality, business impact and root causes
  • Exhibit knowledge of how own specialty area contributes to the business
  • Apply knowledge of competitors, products and services
  • Advise or mentor junior team members
  • Fulltime
Read More
Arrow Right

Director, Service Reliability Engineering

As Director of SRE, you will lead the team responsible for accelerating and auto...
Location
Location
United States , Bethesda
Salary
Salary:
125600.00 - 203700.00 USD / Year
https://www.marriott.com Logo
Marriott Bonvoy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Undergraduate degree in computer science, software engineering, or a related field (or equivalent experience)
  • 10+ years of experience in SRE, devsecops or IT operations
  • At least 5 years’ experience in a previous leadership role within SRE, devsecops or IT Operations
  • At least five years of experience in the following technologies - Presentation Management: HTML, CSS, JS, Backbone, Node JS, Android, iOS, Application Platforms: NGINX, Java, Akana, Play Framework, Tomcat, Docker, Openshift, Application Data: PostgreSQL, Couchbase, Cassandra, Integration Services: Apache Kafka, Apache Spark, Akana, Analytics Platforms: Hadoop, dashDB, Cognos, Tableau, Security: Forgerock, OpenID, OAUTH, Ping Identity, Public Cloud: Azure, Google Cloud, AliCloud, Amazon Web Services, CI/CD: Harness
  • Experience with test automation
  • Working knowledge and proven track record of implementing disaster indifferent architecture
  • Experience with CDN and Akamai tools
  • Linux/Unix system administration experience
  • Proficient in scripting and programming languages (like Python, Go, Bash, Shell)
  • Hands on experience with infrastructure as code (like Terraform), container orchestration (like Kubernetes), and reliability automation
Job Responsibility
Job Responsibility
  • Define and execute Marriott’s SRE vision, aligning with business objectives and technology roadmaps
  • Build, mentor and lead a high-performing SRE team, fostering a culture of collaboration and innovation
  • Establish reliability, observability and automation goals to improve system uptime, performance and scalability
  • Partner with engineering, operations and security teams to drive best practices and continuous improvement
  • Implement reliability-focused engineering practices, including SLAs, SLOs/SLIs and error budgets
  • Design and maintain resilient, scalable and fault-tolerant architectures across cloud and hybrid environments
  • Develop strategies to proactively identify and mitigate risks to system performance and availability
  • Drive root cause analysis (RCA) and post-mortem processes to prevent recurring incidents
  • Champion automation in monitoring, deployment and incident resolution to reduce toil and enhance efficiency
  • Lead and optimize incident response processes, ensuring rapid detection, diagnosis, and resolution of system failures
What we offer
What we offer
  • Bonus program
  • comprehensive health care benefits
  • 401(k) plan with up to 5% company match
  • employee stock purchase plan at 15% discount
  • accrued paid time off (including sick leave where applicable)
  • life insurance
  • group disability insurance
  • travel discounts
  • adoption assistance
  • paid parental leave
  • Fulltime
Read More
Arrow Right
New

SRE Observability Lead Engineer

The SRE Observability Lead Engineer is a hands-on leader responsible for shaping...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Relevant experience in Observability, SRE, Infrastructure Engineering, or Platform Architecture, including several years in senior leadership roles
  • Deep expertise in observability tools and stacks such as Grafana, Prometheus, OpenTelemetry, ELK, Splunk, and similar platforms
  • Strong hands-on experience across hybrid infrastructure, including on-prem, cloud (AWS, GCP, Azure), and container platforms (ECS, Kubernetes)
  • Proven ability to design scalable telemetry and instrumentation strategies, resolve production observability gaps, and integrate them into large-scale systems
  • Experience leading teams and managing people across geographically distributed locations
  • Strong ability to influence platform, cloud, and engineering leaders to ensure observability tooling is built for reuse and scale
  • Deep understanding of SRE fundamentals, including SLIs, SLOs, error budgets, and telemetry-driven operations
  • Strong collaboration skills and experience working across federated teams, building consensus and delivering change
  • Ability to stay up to date with industry trends and apply them to improve internal tooling and design decisions
  • Excellent written and verbal communication skills
Job Responsibility
Job Responsibility
  • Define and own the strategic vision and multi-year roadmap for Observability across Services Technology, aligned with enterprise reliability and production goals
  • Translate strategy into an actionable delivery plan in partnership with Services Architecture & Engineering function, delivering incremental, high-value milestones toward a unified, scalable observability architecture
  • Lead and mentor SREs across Services, fostering a technical growth and SRE mindset
  • Build and offer a suite of central observability services across LoBs – including standardized telemetry libraries, onboarding templates, dashboard packs, and alerting standards
  • Drive reusability and efficiency by creating common patterns and golden paths for observability adoption across critical client flows and platforms
  • Partner with infrastructure, CTO and other SMBF tooling teams, to ensure observability tooling is scalable, resilient, and avoids duplication (“cottage industries”)
  • Work hands-on to troubleshoot telemetry and instrumentation issues across on-prem, cloud (AWS, GCP, etc.), and ECS/Kubernetes-based environments
  • Collaborate closely with the architecture function to support implementation of observability NFRs in the SDLC, ensuring new apps go live with sufficient coverage and insight
  • Support SRE Communities of Practice (CoP) and foster strong relationships with SREs, developers, and platform leads across Services and beyond to accelerate adoption & promote SRE best practices like SLO adoption, Capacity Planning
  • Use Jira/Agile workflows to track and report on observability maturity across Services LoBs – coverage, adoption, and contribution to improved client experience
What we offer
What we offer
  • 27 days annual leave (plus bank holidays)
  • A discretional annual performance related bonus
  • Private Medical Care & Life Insurance
  • Employee Assistance Program
  • Pension Plan
  • Paid Parental Leave
  • Special discounts for employees, family, and friends
  • Access to an array of learning and development resources
  • Fulltime
Read More
Arrow Right
New

Orion Tech SRE Lead - Senior Vice President

The Orion Tech- SRE Lead is a hands-on leader responsible for shaping and delive...
Location
Location
India , Chennai; Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 16+ years of experience in Observability, SRE, Infrastructure Engineering, or Platform Architecture, including 5+ years in senior leadership roles
  • Deep expertise in observability tools and stacks such as Grafana, Prometheus, OpenTelemetry, ELK, Splunk, and similar platforms
  • Strong hands-on experience across hybrid infrastructure, including on-prem, cloud (AWS, Google Cloud), and container platforms (ECS, Kubernetes)
  • Proven ability to design scalable telemetry and instrumentation strategies, resolve production observability gaps, and integrate them into large-scale systems
  • Experience leading teams and managing people across geographically distributed locations
  • Strong ability to influence platform, cloud, and engineering leaders to ensure observability tooling is built for reuse and scale
  • Deep understanding of SRE fundamentals, including SLIs, SLOs, error budgets, and telemetry-driven operations
  • Strong collaboration skills and experience working across horizontal infrastructure teams, building consensus and delivering changes
  • Ability to stay up to date with market trends and apply them to improve internal tooling and design decisions
  • Good understanding of AI tech stack, should be able to create a business case and solve using Citibank AI solutions
Job Responsibility
Job Responsibility
  • Define and own the roadmap for Engineering enablers for Project Orion team aligned with enterprise reliability and SRE Services organization goals
  • Translate Organization strategy into an actionable delivery plan in partnership with Services Products, Operations & Engineering function, delivering incremental, high-value milestones
  • Understand Critical Business Services functional scope and translate into End-to-End monitoring solutions
  • Periodic review and analyze application monitoring TOIL and collaborate with stakeholders and remediate them as per organization goal
  • Identify manual operations use cases which are performed by Level 1 functions. Create a strategic plan to automate
  • Drive reusability and efficiency by tracking problem statements raised by Orion Level 1 Function by providing milestone delivery plan
  • Ability to Design & Build strategic observability dashboard including gold signals like SLO, SLI, Latency & business metrics in a single pane of glass
  • Lead and mentor SREs, fostering a technical growth and SRE mindset
  • Work hands-on to troubleshoot telemetry and instrumentation issues across on-prem, cloud (AWS, GCP, etc.), and ECS/Kubernetes-based environments
  • Use Jira/Agile workflows to track and report on strategic enablers coverage, adoption, and contribution to improved client experience
  • Fulltime
Read More
Arrow Right

Engineering Manager for Observability/CI/CD and Cloud

Lead the AI-Driven Evolution of Groupon’s Global Engineering Platform. At Groupo...
Location
Location
Dublin; Madrid; Prague; Valencia; Warsaw
Salary
Salary:
Not provided
groupon.com Logo
Groupon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years’ experience leading infrastructure, DevOps, or SRE teams (5+ people), ideally in high-change, scale-up environments
  • Deep technical expertise in cloud-native platforms, observability, infrastructure as code, and CI/CD tooling
  • Proven success operationalizing AI tools within engineering workflows
  • Strategic, resilient, and pragmatic approach: ready to own results and thrive under shifting priorities
  • Exceptional communication: able to simplify complexity and effectively partner with C-level and global teams
  • Bachelor’s or Master’s in Computer Science (or similar)—or equivalent industry experience
Job Responsibility
Job Responsibility
  • Lead & Inspire: Build and mentor a high-performing, globally distributed team of CI/CD and Observability engineers (5-10 direct reports), coaching them in cutting-edge AI-assisted workflows and best practices
  • Modernize Core Infrastructure: Spearhead the migration from legacy platforms (Jenkins, ELK) to cloud-native solutions (GitHub Actions, Google Cloud Logging, GCP Prometheus/Grafana). Eliminate “straggler” pipelines and drive cost-efficient, reliable operations
  • AI-First Engineering: Operationalize AI tools (Claude Code, Copilot, ChatGPT, etc.) for everything from log analysis and incident summaries to automated infrastructure as code, making AI-augmented engineering a daily norm
  • Architect & Optimize: Oversee a hybrid tech stack (Kubernetes, Envoy, Terraform, GCP, AWS), ensuring platforms are fast, scalable, and “self-healing” via LLM integrations
  • Collaborate Globally: Act as a thought leader and cross-functional partner, advocating for AI-driven developer experience and collaborating with leaders in SRE, Product, and Cloud
  • Drive Transformation: Deliver strategic projects with tight deadlines and direct business impact, such as the Jenkins-to-GHA and ELK-to-GCP migrations, while maintaining a high standard of technical excellence and cost efficiency
What we offer
What we offer
  • Drive real, high-visibility change at the heart of a company undergoing major transformation
  • Work on complex technical and operational challenges in a fast-paced, AI-first environment
  • Accelerate your impact—and your team’s—using industry-leading AI and automation tools
  • Influence engineering practices across a global platform impacting millions of users
Read More
Arrow Right