CrawlJobs Logo

Site Reliability Engineer - Core

blockchain.com Logo

Blockchain

Location Icon

Location:
United Kingdom , London

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are looking for a Site Reliability Engineer to join our Core team to encourage infrastructure best practices across our organization that would allow to securely scale a distributed financial platform that touches millions of people a day. Our distributed financial platform tackles some of the most interesting problems in the crypto for millions of our customers and continues to grow rapidly. The SRE team at blockchain combines software and systems engineering to provide a platform that abstracts complexity for increased security, reliability and rapid product delivery. As a member of the Core team you will be tasked with developing an in-depth understanding of the infrastructure needs of our products. You will establish and maintain creative engineering solutions to improve our customers’ experience by building necessary tooling. Crucially, you will also guide and educate developer teams so that they can deliver new features in a rapid, secure and scalable manner.

Job Responsibility:

  • Play a critical role in evolving our infrastructure as we develop solutions to complex technical problems involving reliability, latency, bandwidth and most importantly security
  • Be an integral part of improving observability, monitoring and alerting throughout the platform
  • Help co-ordinate work across different areas of the company to ensure the most efficient path of execution
  • Centralize wherever possible common streams of work that are currently duplicated across developer teams
  • Focus heavily on writing tooling to replace manual, repetitive work in a scalable way
  • Work in a fast paced, and dynamic environment complementing our existing high calibre team

Requirements:

  • Experience with containerization and service orchestration, including best practices and security
  • Strong knowledge of at least one programming language
  • Linux, including an understanding of resource allocation, network and/or internals
  • Experience working with cloud solutions (GCP or AWS)
  • Deep understanding and demonstrable experience with modern monitoring tools such as Prometheus, Datadog, Grafana, Telegraf
  • Experience with infrastructure as code tools
  • Solid background with configuration management tools
  • Experience with using GitOps and CI to make changes, preferably Github Actions
  • Experience with messaging systems such as Kafka
  • Experience with database management

Nice to have:

  • Experience with Hashicorp Nomad, Consul and Vault is a plus
  • Experience with Golang, Python, and Bash is a plus
  • Experience with complex Terraform deployments is a plus
  • Experience with Saltstack is a plus
  • Experience working in Data Centers is a plus
  • Knowledge of routing and switching protocols is a plus
What we offer:
  • Full-time salary based on experience and meaningful equity in an industry-leading company
  • Hybrid model working from home & awesome office location in the heart of London
  • Unlimited vacation policy
  • work hard and take time when you need it
  • Work from Anywhere Policy: You can work remotely from anywhere in the world for up to 20 days per year
  • Apple equipment
  • The opportunity to be a key player and build your career at a rapidly expanding, global technology company in an emerging field
  • Flexible work culture

Additional Information:

Job Posted:
December 06, 2025

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Site Reliability Engineer - Core

Senior Site Reliability Engineer

Digital Business Services (DBS) Our GCIO organisation plays a critical role for ...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
https://www.hsbc.com Logo
HSBC
Expiration Date
December 31, 2025
Flip Icon
Requirements
Requirements
  • Bachelor's degree in computer science, Information Technology, or a related field. Advanced degrees or certifications (e.g., ITIL, AWS Certified Solutions Architect, Google SRE) are a plus
  • Minimum of 5 years of experience in site reliability engineering, software development, or systems engineering, preferably in a financial services environment
  • Proven experience in automating operational processes and managing high-availability systems
  • Experience collaborating with production support, application development, and global teams in a distributed environment
  • Programming: Proficiency in Python, Go, Java, or Ruby for automation and tool development
  • Systems: Deep knowledge of Linux/Unix systems for administration, performance tuning, and debugging
  • Cloud and Infrastructure: Expertise in AWS, Azure, or GCP, and Infrastructure as Code (IaC) tools like Terraform or Ansible
  • Containerization: Experience with Docker and Kubernetes for managing containerized banking applications
  • Monitoring: Proficiency in Prometheus, Grafana, Splunk, or Datadog for observability and performance monitoring
  • CI/CD: Familiarity with Jenkins, GitLab CI, or GitHub Actions for integrating reliability into deployment pipelines
Job Responsibility
Job Responsibility
  • Design, develop, and implement automation tools and scripts to reduce manual operational tasks ("toil") and enhance system resilience
  • Ensure high availability (e.g., 99.99% uptime) of critical banking applications, including core banking, payment systems, and global platforms/local system
  • Conduct capacity planning and chaos engineering to test and improve system resilience under failure conditions
  • Participate in on-call rotations to respond to production incidents, troubleshoot issues, and conduct post-mortems to prevent recurrence
  • Collaborate with production support teams for rapid incident resolution and escalate complex issues to application teams or vendors as needed
  • Work closely with production support teams to streamline incident handling and integrate automated solutions into support processes
  • Partner with application development teams to embed reliability practices into the software development lifecycle (SDLC)
  • Engage with the bank's operation resilience project team to align on initiatives for regulatory compliance, disaster recovery, and system robustness
  • Coordinate with global and regional SRE and DevOps teams to ensure consistency in tools, processes, and standards across distributed banking systems
  • Implement and maintain monitoring solutions to track service-level indicators (SLIs) and ensure service-level objectives (SLOs) are met
  • Fulltime
!
Read More
Arrow Right

Senior Site Reliability Engineer

We are seeking an experienced Senior Site Reliability Engineer (L3) to join our ...
Location
Location
India , Chennai
Salary
Salary:
Not provided
arcadia.com Logo
Arcadia
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience
  • 8–10+ years of experience in SRE/DevOps/Cloud Engineering, with deep hands-on exposure to AWS and Kubernetes
  • Strong hands-on experience with: Terraform & Infrastructure as Code
  • AWS core services (EKS, IAM, RDS, EC2, VPC, CloudWatch, CloudTrail, GuardDuty)
  • Jenkins + Groovy, GitHub Actions, ArgoCD, FluxCD
  • Kubernetes troubleshooting and operations
  • Prometheus/Grafana/Datadog observability stacks
  • Proven ability to operate in high-scale, high-uptime, multi-environment production systems
  • Experience building automation via Python/Bash and reducing operational toil
  • Strong understanding of incident management, root cause analysis, and reliability engineering principles
Job Responsibility
Job Responsibility
  • Design, build, and maintain AWS infrastructure (EKS, VPC, RDS, IAM, CloudWatch, CloudTrail, GuardDuty, Load Balancers, S3, CloudFront) using Terraform and CloudFormation
  • Lead all aspects of Kubernetes operations including cluster upgrades, performance tuning, CNI troubleshooting, workload scaling, Helm chart packaging, and GitOps deployments
  • Own and evolve our CI/CD ecosystem across Jenkins (Groovy scripting), GitHub Actions, AWS CodePipeline, ArgoCD, and FluxCD
  • Improve platform reliability by reducing operational toil through automation, scripting (Python/Bash), and proactive system hardening
  • Implement and enhance observability across Prometheus, Grafana, Loki, Tempo, Datadog, and CloudWatch—ensuring actionable alerting, dashboards, and metrics alignment with SLO/SLIs
  • Drive FinOps initiatives, identifying cost inefficiencies and working with engineering teams to implement best practices, tagging standards, budgeting, and resource right-sizing
  • Manage database operations across MySQL and PostgreSQL including backups, performance tuning, replication, and operational runbooks
  • Maintain and improve secret management using Vault, AWS Secrets Manager, and Parameter Store
  • Strengthen cloud security posture with IAM least privilege, CSPM reviews, audit readiness, GuardDuty/CloudTrail monitoring, and environment hardening
  • Troubleshoot complex production issues across networking, Kubernetes, compute, databases, and CI/CD systems
What we offer
What we offer
  • Competitive compensation and employee stock options
  • Hybrid/remote-first working model (India-based role, with global collaboration)
  • Flexible leave policy
  • Comprehensive medical insurance (self + family members)
  • Annual performance cycle + quarterly recognition awards
  • A supportive, diverse engineering culture grounded in empathy, teamwork, and innovation
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

As a Senior Site Reliability Engineer on the Platform team, you will identify is...
Location
Location
United States , Denver; San Francisco
Salary
Salary:
138000.00 - 191000.00 USD / Year
https://checkr.com Logo
Checkr
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Degree in Computer Science (or related field)
  • 6+ years of experience in building tools with Python (preferred), GoLang, or Ruby
  • 6+ years of experience in maintaining and observing production customer-facing environments in AWS or Azure
  • 6+ years of experience as a member of an incident response team
  • Deep understanding of the fundamental infrastructure and platform concepts behind a micro-service architecture, REST APIs, and asynchronous queueing models
  • Experience with observability platforms and frameworks like Datadog, Splunk, Grafana, Prometheus, or OpenTelemetry
  • Strong collaboration, documentation, communication, and project management skills
  • Experience with container orchestration using Kubernetes/Docker/Terraform
  • Experience driving platform adoption across engineering teams, guided by a self-service and product-first approach
  • A passion for customer-centricity and building relationships with other teams
Job Responsibility
Job Responsibility
  • Collaborate, drive, and execute architectural discussions with cross-functional teams
  • Lead cross-team projects and SREs' technical roadmap to enable engineering and help Checkr customers
  • Design, build, ship, and maintain the core observability libraries, tools, and patterns used by all of Checkr’s engineering teams
  • Proactively engage across teams to foster service reliability, efficiency, and scalability
  • Troubleshoot complex production issues across the stack, with respect to performance, availability, and data quality
  • Present detailed technical information and benefits of the Checkr platform to a wide array of customers, including operations, developers, technical architects, and executives
What we offer
What we offer
  • A fast-paced and collaborative environment
  • Learning and development allowance
  • Competitive cash and equity compensation and opportunities for advancement
  • 100% medical, dental, and vision coverage
  • Up to $25K reimbursement for fertility, adoption, and parental planning services
  • Flexible PTO policy
  • Monthly wellness stipend, home office stipend
  • In-office perks such as lunch four times a week, commuter stipend, and an abundance of snacks and beverages
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer/ Sr DevOps

We are offering a contract to permanent employment opportunity for a Site Reliab...
Location
Location
United States , Woodland Hills
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum of 5 years experience in a similar role
  • Proven expertise in Amazon EC2
  • Experience with Ansible for configuration management
  • Knowledge of Apache ANT+ and Apache Tomcat
  • Familiarity with Atlassian Jira for project management
  • Experience in AB Testing methodologies
  • Proficiency in Agile Scrum methodologies
  • Demonstrated skills in automation processes
  • Comprehensive understanding of AWS Technologies
  • Ability to perform Cluster Analysis
Job Responsibility
Job Responsibility
  • Architect and design applications for migration, ensuring they align with compliance standards and best practices
  • Actively participate in building solutions and gain an acute understanding of core infrastructure services and their interaction with applications
  • Provide technical leadership to offshore teams, aiding in the distribution of leadership tasks
  • Execute scripting tasks within a C# and .NET environment
  • Understand and manage the interaction between Service Bus, messaging queues, and other applications, and their subsequent impact on infrastructure
  • Work extensively with Azure and AWS ecosystems
  • Ensure the smooth functioning of applications by understanding the intricacies of infrastructure services
  • Utilize AWS and Azure expertise in scripting within C# and .NET environment
  • Handle the interaction of Service Bus or messaging queues with other applications and its impact on infrastructure
  • Engage in hands-on work to build solutions while understanding the interaction of core infrastructure services with applications
What we offer
What we offer
  • Medical, vision, dental, and life and disability insurance
  • Eligibility to enroll in company 401(k) plan
  • Fulltime
Read More
Arrow Right

Infrastructure Engineer

Descript is on a mission to make audio and video content creation and editing fa...
Location
Location
United States , San Francisco
Salary
Salary:
191000.00 - 250000.00 USD / Year
descript.com Logo
Descript
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years experience in production/site-reliability engineering OR 5+ years of server-side software engineering with an interest in working on core infrastructure
  • A solid understanding of at least two of: public cloud infrastructure, Linux systems administration, and DevOps tooling.
  • Basic coding skills to work on automation and technical guardrails.
  • Strong written and verbal communication skills, and the ability to collaborate with other functions
  • Experience mentoring engineers, including code reviews, architecture discussions, and leadership skills
Job Responsibility
Job Responsibility
  • Develop technical and business solutions that enable engineers to improve the quality and reliability of product features and systems that they build.
  • Drive improvements to the reliability of our core infrastructure, such as production clusters, networking, databases, and observability systems.
  • Champion best practices during reviews of code, technical designs, and launch plans.
  • Own our incident management and fire drill processes.
  • Work with engineering leadership to set goals and prioritize production reliability.
What we offer
What we offer
  • generous healthcare package
  • 401k matching program
  • catered lunches
  • flexible vacation time
  • Fulltime
Read More
Arrow Right

Senior Platform Engineer - AWS

We’re currently looking for a skilled and enthusiastic Senior Platform Engineer ...
Location
Location
Germany , Hamburg or Berlin
Salary
Salary:
73000.00 - 90000.00 EUR / Year
aboutyou.de Logo
About You
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional experience in Platform Engineering, DevOps, or Site Reliability Engineering (SRE), with a significant focus on cloud infrastructure
  • Fluency in scripting languages (e.g., Python, Go, Bash) for system automation, tooling development, and operational tasks
  • Deep expertise in managing and scaling production workloads within a major public cloud provider (e.g., AWS, Azure, or GCP), including strong familiarity with core services like Compute, Networking, Identity & Access Management (IAM), and Managed Database
  • Proven mastery of Infrastructure-as-Code (IaC) using AWS CloudFormation and/or Terraform in complex, multi-account environments
  • Demonstrated experience designing, implementing, and maintaining robust CI/CD pipelines
  • Solid knowledge of monitoring and logging solutions
  • Excellent communication and documentation skills, with the ability to articulate complex technical issues to technical stakeholders
Job Responsibility
Job Responsibility
  • Own and evolve the Commerce Cloud’s AWS infrastructure through the application of Infrastructure-as-Code (IaC) principles to ensure scalability, high availability, and cost efficiency
  • Design, implement, and optimize CI/CD pipelines and operational workflows utilizing tools such as GitLab CI, AWS CloudFormation, and Terraform
  • Establish and enforce comprehensive, high-quality documentation for all infrastructure, operational playbooks, and critical architecture decisions
  • Act as a subject matter expert and trusted advisor, partnering with application development teams to architect and provision infrastructure that meets their specific workload requirements
  • Drive collaborative efforts with GCP Platform Engineers on cross-cloud initiatives and work closely with Information Security Engineers to design and implement security controls and governance policies
  • Spearhead the evaluation and adoption of emerging cloud and platform technologies, continuously seeking opportunities to improve platform performance and developer experience
What we offer
What we offer
  • Hybrid working
  • Sports courses
  • Free access to code.talks
  • Exclusive employee discounts
  • Free drinks
  • Language courses
  • Laracast account for free
  • Company parties
  • Help in the relocation process
  • Mobility subsidy
  • Fulltime
Read More
Arrow Right

FLEX Senior Solutions Architect

Accountable for the research, analysis, design, creation and implementation of P...
Location
Location
United States , Bethesda
Salary
Salary:
83.17 - 101.11 USD / Hour
https://www.marriott.com Logo
Marriott Bonvoy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years in an IT operational role supporting mission critical solutions or applications with 5+ years leading an infrastructure organization
  • Bachelor's Degree in IT-related field with five (5)+ years of equivalent combination of education and experience and training
  • 3+ years of experience providing operations and sustainment support for cloud infrastructure service on Amazon or Azure or Ali cloud
  • 5+ years’ experience in any of the following: Public Clouds/Virtual Deployment using ESXi, Amazon Web Services (AWS) / EC2/EKS, Microsoft Azure, Oracle Cloud, Ali cloud, SaaS
  • Graduate degree in technical discipline
  • Strong diagnostic skills with regards to identification and classification of malicious BOT traffic
  • SaFe agile delivery framework
  • Experience supporting modern operating models (Site Reliability engineering)
  • Experience in System Engineering of servers, storage, network, etc.
  • Familiarity with large scale cloud infrastructure, including network architectures, routing, DNS, TCP/IP protocols, and SSL/TLS ciphers
Job Responsibility
Job Responsibility
  • Provides leadership, oversight, governance, and strategic direction related to Infrastructure services to enable the delivery of IT services
  • Defines the Marriott infrastructure architecture and governance model
  • Provides technical leadership, oversight, standardization, and validation of the effectiveness for the Enterprise Infrastructure environment
  • Research, designs, and implements high-performing software components that are standards-based, highly available and secured, delivering the required business functionality
  • Educates internal and external users of the technologies to continually improve the knowledge and skill-base of the organization on how best to operate and support the infrastructure services
  • Develops documents with a focus on how services will be leveraged in the solution architecture
  • Participates in the evaluation and selection of Infrastructure based products
  • Work closely with the EA team to facilitate alignment of plans with what is being delivered
  • Institutes governance based on best practices and ensure proper alignment to projects and major initiatives
  • Leads the analysis of the current environment to detect critical deficiencies and recommends solutions for improvement
What we offer
What we offer
  • bonus program
  • comprehensive health care benefits
  • 401(k) plan with up to 5% company match
  • employee stock purchase plan at 15% discount
  • accrued paid time off
  • life insurance
  • group disability insurance
  • travel discounts
  • adoption assistance
  • paid parental leave
  • Fulltime
Read More
Arrow Right

Lead Infrastructure and Automation Engineer

The Lead Infrastructure and Automation Engineer is responsible for developing, m...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
communityfibre.co.uk Logo
Community Fibre
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum of 7-10 years of experience working in a server and storage environment
  • Advanced understanding of Linux (Ubuntu / Debian, and CentOS / RHEL)
  • Experience with infrastructure as code (IAC) practices
  • Configuration management: Puppet and Ansible
  • Orchestration: Terraform
  • DCIM/IPAM tools, e.g. NetBox
  • Log ingestion: OpenSearch/Elasticsearch, Logstash, Kibana, Filebeat, Syslog, Graylog
  • Containerisation: Docker and / or Kubernetes
  • Virtualisation: VMware 7.x
  • Cloud: AWS
Job Responsibility
Job Responsibility
  • Develop, maintain, improve, and support Community Fibre’s infrastructure service environments
  • Manage and maintain existing Linux based servers, both on-prem and cloud hosted
  • Work with the Network Technology Team
  • Build new servers / systems
  • Ensure existing ones are maintained, reliable and resilient
  • Cover backend systems engineering, infrastructure, and site reliability engineering
  • Provide guidance and mentoring to other engineers
  • Create and implement high and low-level designs
  • Act as a senior member for the Network Technology team
  • Provide reports to SLT, Exec’ and Board members when required
What we offer
What we offer
  • 25 days holiday, increasing by 1 day for each year of service up to 28 days
  • Birthday leave
  • Cycle to work scheme
  • Flexible WFH policy
  • Private Health Cover
  • Fulltime
Read More
Arrow Right
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.