CrawlJobs Logo

Site Reliability Engineering Specialist

plus.net Logo

Plusnet

Location Icon

Location:
Hungary , Budapest

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

BTI Professionals provide expert third-line reliability and operational support for BT International’s Global Fabric Network-as-a-Service (NaaS) product, ensuring high availability, performance, and service resilience. We are seeking a Site Reliability Engineering Specialist to support the reliability and operability of the Global Fabric service. This role focuses on service reliability for a live NaaS product, working closely with network and platform teams rather than owning the underlying platform.

Job Responsibility:

  • Provide SRE ownership for the Global Fabric NaaS service, ensuring availability, performance, and resilience
  • Support safe, automated change into production using CI/CD, GitOps, and automated testing
  • Operate and improve monitoring and observability using Dynatrace, Prometheus, and Elasticsearch
  • Troubleshoot incidents across Kubernetes-hosted applications, Linux systems, networking, and service integrations
  • Act as a third-line escalation point, participating in a 24x7 on-call rota
  • Manage incidents via ServiceNow and track defects and improvements in Jira
  • Contribute to Scrum ceremonies and PI planning, supporting Agile delivery
  • Drive automation using Ansible and scripting to reduce operational toil
  • Mentor and support L2 engineers, improving runbooks, troubleshooting practices, and operational readiness

Requirements:

  • Experience supporting large-scale, high-availability services in an ISP / NaaS / network-centric environment
  • Strong Linux troubleshooting and systems knowledge
  • Hands-on Kubernetes experience operating applications in production
  • Experience delivering changes using GitOps and CI/CD pipelines (including release validation and rollback awareness)
  • Working knowledge of incident/problem management in ServiceNow and delivery tracking in Jira (Scrum / PI planning)
  • Experience with observability tooling: Dynatrace, Prometheus, Elasticsearch, plus event/messaging platforms such as Kafka
  • Solid networking fundamentals to support effective troubleshooting
  • Automation experience with Ansible and at least one of Python / Go / Bash
  • Experience integrating or operating services with LDAP (authentication/authorisation, troubleshooting access issues)

Nice to have:

  • Exposure to platform or infrastructure operations (VMs, Kubernetes upgrades, storage troubleshooting)
  • Knowledge of BGP, IS-IS
  • Experience with Cisco, Juniper, or Nokia platforms
  • Experience supporting automated testing and controlled production deployments
What we offer:
  • Cafeteria package - HUF 600,000/ year
  • Performance-based bonus
  • Comprehensive private health care package for all the employees, which can be extended to family members
  • Nursery support for mothers returning from maternity
  • Extended paternity leave: 10+10 day fully paid days
  • Commuting allowance
  • Home office allowance
  • Employee discount opportunities
  • Highly affordable mobile packages for the family as well
  • Car allowance
  • New high-class offices both in Budapest and Debrecen
  • Wide-range of company and community programmes (including support for different sport activities)
  • Family-friendly culture
  • Smart working approach (hybrid working model, 3 together, 2 wherever)

Additional Information:

Job Posted:
February 13, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Site Reliability Engineering Specialist

Manager, Reliability

Responsible for sustaining and continuously improving various mechanical compone...
Location
Location
United States , Big Spring
Salary
Salary:
Not provided
delekus.com Logo
Delek US
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4 year / Bachelor's Degree (Required)
  • Four (4) or more years Experience in a related field (Required)
  • No Licensure or Certification Required
  • Manages and leads the activities of the Reliability engineers and specialists
  • Ensures compliance to Engineering Practices/Mechanical Integrity at the site level
  • Champions initiatives, projects, and programs that support the reliability vision
  • Guides Reliability Engineers to grow their technical and leadership skills
  • Develops working relationships with site leaders to guide teams on reliability centered processes and investigations
  • SPOC between Corporate Reliability and site activities
  • Reliability Department budget owner
Job Responsibility
Job Responsibility
  • Responsible for sustaining and continuously improving various mechanical components for equipment and tools
  • Ensures the safe, effective operations of the organization's production and supports continuous improvement
  • Manages reliability engineering projects
  • Performs analytical verification
  • Evaluates, tests and tracks results of reliability interventions
  • Initiates reporting for internal or third-party reported incidents
  • Creates, documents, and follows up on corrective actions
  • Prepares routine reports and memos and coordinate communications across all necessary functional groups of the organization
What we offer
What we offer
  • up to a 10% match on 401K on your hire start, with a vesting timeline of only one year
  • medical benefits that start on day one with a 30% premium rebate annually
  • access to the Calm app for FREE
  • additional annual incentives through performance management program
  • Fulltime
Read More
Arrow Right

Construction Maintenance Specialist

Join Galp and bring your curiosity and passion every day. With a customer-centri...
Location
Location
Portugal , Madeira
Salary
Salary:
Not provided
https://www.galp.com/ Logo
Galp
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s/Master’s degree in Mechanical or Civil Engineering
  • Minimum of 3 years of prior experience in similar roles
  • Professional Gas Technician certification
  • Membership in the Order of Engineers
  • Experience in Project Management
  • Proficiency in English
  • Strong computer skills, including MS Office 365, Power BI, and SharePoint
  • Excellent written and verbal communication skills
  • Customer and results-oriented mindset
  • Strong leadership skills and experience in managing service providers and teams
Job Responsibility
Job Responsibility
  • Lead and coordinate multidisciplinary teams of service providers and collaborate with other departments in executing engineering projects for Galp LPG clients or potential clients
  • Promote and manage LPG construction projects to support Residential and Enterprise business development
  • Ensure the maintenance and requalification of LPG assets in the Madeira archipelago, including networks, parks, and gas cabins
  • Participate and collaborate in the licensing process for LPG assets in Madeira
  • Manage Galp Madeira’s internal installation teams
  • Assume Technical Responsibility for the Operating Entity
  • Participate in the emergency and urgent maintenance response team of Galp Madeira
  • Ensure compliance with Health, Safety, and Environmental (HSE) standards and procedures on-site
  • Understand and follow technological development trends, acting as an agent for change management, business challenges, and requirements
  • Contribute to continuous improvement in the processes of the Construction and Renovation unit within the Technical Operations area
  • Fulltime
Read More
Arrow Right

Senior Applications Specialist

Location
Location
Canada , Mississauga
Salary
Salary:
Not provided
advancedtechsearch.com Logo
Advanced Technology Search Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A Degree in Electrical Engineering, Computer Science, or related technical discipline or equivalent experience
  • At least 3 years of experience in advanced systems engineering support, focusing on complex technical problem resolution
  • Proven generalized understanding of computer networking (LAN, WAN, NAT, DNS, Basic Firewalls, etc.)
  • Hands-on experience with Linux and/or Windows (CMD, Bash, PS, regedit, etc.)
  • Demonstrated ability to diagnose sophisticated technical issues and implement effective solutions
  • Ability to work collaboratively with cross-functional teams
  • Willing to adapt to evolving technologies and industry standards
Job Responsibility
Job Responsibility
  • Analyze complex technical issues and system integrations to identify root causes and develop effective solutions
  • Conduct systematic analysis to diagnose customer system issues and implement effective technical solutions
  • Travel to customer’ sites in Canada and US for advanced troubleshooting and customer support
  • Collaborate with designers, developers, and stakeholders and well as technical support team to endure seamless product integration and customer satisfaction
  • Manage the deployment and configuration of integrated systems, ensuring optimal performance and reliability
  • Develop detailed Product Support Documents, and train internal technical support as appropriate
  • Develop comprehensive technical manuals and field installation guides to support customers during product installation, commissioning, and troubleshooting
  • Investigate and review recurring product issues to drive product improvements
  • Equip and support the team with in-depth product knowledge and configuration strategies
  • Provide post-sales customer support, including consultation on product configuration, installation, and usage
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering Specialist

This role will specialise in system administration and server management with a ...
Location
Location
United Kingdom , Birmingham
Salary
Salary:
Not provided
plus.net Logo
Plusnet
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience in an ISP Environment: Proven experience in a fast-paced ISP setting, managing and troubleshooting large-scale networks
  • Sysadmin/Server Management: Strong skills in system administration, server management, and compute resources with experience in deploying and managing containerised applications using orchestration tools such as Kubernetes
  • Technical Proficiency: Strong understanding of network architecture, design, and implementation
  • Monitoring and Logging Solutions: Familiarity with monitoring and logging solutions such as Elastic search, Apache Kafka, and Prometheus
  • Programming Proficiency: Proficiency in at least one programming language, such as Python, Ansible or Go
  • Growth Mindset: Self-driven attitude towards learning new skills and aiding the development of others
Job Responsibility
Job Responsibility
  • Network Delivery: Support the Implementation of flawless change into the live network, utilising automation and CI/CD pipelines
  • Network Monitoring: Configure, maintain, and monitor systems and network infrastructure to ensure optimal health, performance, and reliability
  • Automation Tools: Utilise tools such as Ansible to provision and manage infrastructure resources in a scalable and efficient manner
  • Technical Acumen: Apply your understanding of network principles to troubleshoot network faults within our systems and look at how you can optimise performance and enhance security across our infrastructure
  • Incident Management and Resolution: Be prepared to support a 365x24/7 callout, providing third line technical resolution covering an extensive range of technologies
  • Customer Focus: Be a technical expert who understands the end-to-end journey of our customers
  • Growth and Development: As a technically talented expert you should enhance the brand of the team and support those around you to be accountable and perform at their best
What we offer
What we offer
  • Competitive salary
  • 10% on target bonus
  • BT Pension scheme, minimum 5% Employee contribution, BT contribution 10%
  • 25 days annual leave (not including bank holidays), increasing with service
  • Huge range of flexible benefits including cycle to work, healthcare, season ticket loan
  • World-class training and development opportunities
  • Option to join BT Shares Saving schemes
  • Discounted broadband, mobile and TV packages
  • Access to 100’s of retail discounts including the BT shop
  • On call allowances and overtime
  • Fulltime
Read More
Arrow Right
New

AI Platform Site Reliability Engineering Specialist

The AI Platform Site Reliability Engineering Specialist will operate and maintai...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science or related field, or equivalent job experience
  • 5 years of production experience in SRE / Infrastructure / ops for large-scale systems
  • Strong programming/scripting skills (Python, Go, Java, or equivalent)
  • Deep experience with containerization (Docker), orchestration (Kubernetes, etc.)
  • Infrastructure-as-code (Terraform, Helm, CloudFormation, Ansible, etc.)
  • Familiarity with GPU / AI compute clusters, high-performance data storage, and distributed architectures
  • Experience with monitoring / observability / logging / alerting tools (Prometheus, Grafana, ELK / EFK, Datadog, etc.)
  • Networking and systems engineering knowledge (TCP/IP, DNS, routing, load balancing, distributed storage)
  • Solid experience in capacity planning, performance tuning, scaling, and incident response
  • Demonstrated ability to lead RCAs, deploy fixes, and drive reliability improvements
Job Responsibility
Job Responsibility
  • Operate, monitor, and maintain the infrastructure supporting GenAI applications ( training, inference, feature store, data ingestion, model serving)
  • Design and build automation for core platform capabilities, reducing manual toil
  • Develop and maintain infrastructure-as-code (IaC) for provisioning and managing compute, storage, network, GPU clusters, Kubernetes / container orchestration, etc.
  • Establish, monitor and enforce SLOs/SLIs/LSAs, error budgets, alerting, and dashboards
  • Lead incident response, root cause analysis (RCA), postmortems, and systemic remediation
  • Perform capacity planning, scaling strategies, workload scheduling and resource forecasting
  • Optimize cost vs. performance trade-offs in large-scale compute environments
  • Harden systems for security, compliance, auditability, and data governance
  • Collaborate across teams (cloud engineers, data engineers, infrastructure, security) to ensure safe deployment, rollout, rollback, and integration of new systems
  • Define disaster recover (DR) strategies, back/restore practices, fault tolerance mechanisms
Read More
Arrow Right

Operations Lead, Command Center

As an Operations Lead, Command Center, you’ll serve as the on-shift leader respo...
Location
Location
United States , Lancaster
Salary
Salary:
Not provided
kodiak.ai Logo
Kodiak Robotics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 1+ years in logistics, dispatch, fleet operations, or command-center environments
  • Proven experience coordinating on-shift teams or live operational logistics
  • Strong understanding of operational control centers, dispatch coordination, or multi-route fleet management
  • Proven ability to manage live operations, resolve escalations, and maintain composure under pressure
  • Familiarity with transportation, logistics, or autonomous vehicle operations
  • Understanding of FMCSA regulations, safety protocols, and field coordination workflows
  • Skilled at real-time decision-making and communication in high-pressure or customer-facing scenarios
  • Adept at translating complex operational data into clear, actionable updates for both internal and external stakeholders
  • Leads with composure and accountability, ensuring the team operates with urgency and precision
  • Builds a strong culture of accountability, teamwork, and continuous improvement
Job Responsibility
Job Responsibility
  • Serve as the on-shift leader responsible for the coordination of live autonomous and human-assisted trucking operations
  • Ensure uninterrupted service delivery, proactive customer communication
  • Work closely with Operations Specialists, remote operators, and On-Site Support teams to ensure operations run safely, efficiently, and with minimal downtime
  • Recognized subject matter expert across Command Center workflows
  • Deep command of monitoring tools, triage logic, playback analysis, and mission dynamics
  • Identifies process gaps and drives workflow improvements
  • Provides high-quality operational insights to Engineering and Operations leadership
  • Serves as the primary escalation point during shifts
  • Guides prioritization, distributes workload, and ensures shift readiness
  • Coordinates responses during high-severity issues
What we offer
What we offer
  • Competitive compensation package including equity and biannual bonuses
  • Excellent Medical, Dental, and Vision plans
  • Generous PTO and parental leave policies
  • Long Term Disability, Short Term Disability, Life Insurance
  • Wellbeing Benefits - Headspace, One Medical, Gympass, Spring Health
  • Fidelity 401(k)
  • Commuter, FSA, Dependent Care FSA, HSA
  • Various incentive programs (referral bonuses, patent bonuses, etc.)
Read More
Arrow Right

Machinery Control Engineering Specialist

Job Description – Machinery Control Engineering Specialist
Location
Location
Qatar , Doha
Salary
Salary:
Not provided
airswift.com Logo
Airswift Sweden
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor Degree in Engineering or Science major in Engineering in the area of specialization
  • 8 years’ experience in the area of specialization within Oil and Gas industry
  • Expert level knowledge in machinery control systems and schemes such as GE, CCC, DCS and DLN tuning
Job Responsibility
Job Responsibility
  • Advise and support machinery control systems implementation during EPC and Startup phase until handover to permanent Asset whilst ensuring project deliverables meet Operations expectations in term of Safety, Operability, Reliability and Maintainability over the facility design life cycle
  • Provide a variety of Machinery Control Engineering supports (in the areas of GE, CCC, DCS, DLN tuning, etc) to contribute to the division strategy and goals which includes design reviews, studies, commissioning, start-up, failure analysis and advanced troubleshooting as part of punch list and warranty claim resolutions, and Management of Change
  • Review, validate and support development of advanced machinery control schemes of machines and its drivers (Gas turbine, Steam turbine and variable frequency drives for motor) of new projects to achieve high standards of Reliability, Availability and Cyber Security
  • Review and develop testing procedures like surge tests, DLN tuning, Machine Test Runs and performance tuning for turbo machinery systems and DCS controls for machineries
  • Support Factory Acceptance Tests and review performance test results of machines. Validate machinery performance with theoretical calculation and compare to OEM performance maps and actual performance at site
  • Develop plans and work schedules to ensure effective completion of tasks and activities, meeting the section and division KPIs
  • Provide technical inputs related to area of expertise to other disciplines and functions as per requirements to support overall business objectives with the key focus being in the area of process and machine interactions
Read More
Arrow Right

Site Manager

As a Special Projects Site Manager, you will be overseeing the construction of P...
Location
Location
United Kingdom , Nationwide
Salary
Salary:
50000.00 - 60000.00 GBP / Year
hederahiring.com Logo
Hedera Hiring Ltd
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven track record as a reliable team player
  • Site management experience in handling reasonably complex projects, their direct, indirect, and specialist resources, particularly in electrical utility environments
  • NVQ Level 3 qualification in engineering, construction, and/or management, or equivalent knowledge, training, and experience
  • Experience in managing projects classified as "notifiable" under the Construction, Design & Management Regulations 2015
  • IOSH Managing Safely certification, equivalent, or working towards it
  • Strong experience in delivering civil construction projects in utility environments
  • Experience in delivering construction projects following formal construction practices
  • Accreditation as a Company Project Manager
  • Proficiency in Word, Excel, PowerPoint, with adequate cost and finance soft skills
  • Ability to manage project delivery schedules and GANTT charts using Microsoft Project or equivalent tools
Job Responsibility
Job Responsibility
  • Represent the client professionally to all stakeholders
  • Ensure safe and responsible delivery of all Special Projects Sites under your management
  • Maintain daily event records
  • Provide weekly progress reports to Project Planner, Quantity Surveyor, or Senior Project Manager as needed
  • Ensure positive outcomes for all sites under your responsibility
  • Daily management of all directly or indirectly employed resources
  • Manage and control changes in project scope
  • Proactively identify and manage project risks
  • Achieve project delivery goals within agreed budget limits
  • Review and comment on all external Risk Assessment and Method Statements (RAMS)
  • Fulltime
Read More
Arrow Right