CrawlJobs Logo

Engineer, SRE GenAI

https://www.t-mobile.com Logo

T-Mobile

Location Icon

Location:
United States , Bellevue

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

92500.00 - 166800.00 USD / Year
Save Job
Save Icon
Job offer has expired

Job Description:

As an Engineer in Site Reliability Engineering (SRE) for AI Systems, you will help ensure the reliability, scalability, and performance of AI platforms. This role includes participating in on-call rotations, improving system observability, and supporting operations across cloud-native infrastructure. This is a hands-on role ideal for someone with foundational SRE skills and a growth mindset to expand in GenAI and LLM infrastructure operations.

Job Responsibility:

  • Participate in on-call rotations to support AI platforms and respond to production incidents with urgency and precision
  • Monitor system health and performance using tools like Grafana, Splunk, and PowerBI
  • Support cloud-native infrastructure deployments, with a focus on Azure (primary), and exposure to AWS or GCP
  • Implement runbooks and automate repetitive operational tasks to reduce toil
  • Support CI/CD pipelines and IaC deployments using Gitlab pipelines, Databricks
  • Assist in the development and enforcement of Service Level Objectives (SLOs) and real-time alerts for AI APIs and services
  • Collaborate with senior engineers to improve platform reliability and scale LLM-based applications

Requirements:

  • Bachelor's Degree Computer Science, Engineering or a related field
  • 2–4 years of experience in DevOps, SRE, or cloud platform engineering
  • Hands-on experience with monitoring/logging systems such as Prometheus, Grafana, Splunk, or OpenSearch
  • Familiarity with cloud environments (preferably Azure
  • AWS/GCP a plus)
  • Experience in scripting or automation using Python, Bash, or PowerShell
  • Basic understanding of containerization (Docker, Kubernetes) and CI/CD concepts
  • Willingness to participate in an on-call schedule and incident resolution
  • Strong solving and root cause analysis skills
  • Communication
  • Customer Service
  • Analytics
  • Technical Writing
  • At least 18 years of age
  • Legally authorized to work in the United States

Nice to have:

  • Exposure to AI/ML infrastructure or LLM-based systems (e.g., OpenAI, ChatGPT, Azure OpenAI)
  • Experience with infrastructure-as-code tools like Terraform or ARM templates
  • Familiarity with LLM observability or API token usage metrics
  • Passion for learning AI reliability practices and collaborating with cross-functional teams
What we offer:
  • Competitive base salary and compensation package
  • Annual stock grant
  • Employee stock purchase plan
  • 401(k)
  • Access to free, year-round money coaches
  • Annual bonus or periodic sales incentive or bonus
  • Medical, dental and vision insurance
  • Flexible spending account
  • Paid time off and up to 12 paid holidays
  • Paid parental and family leave
  • Family building benefits
  • Back-up care
  • Enhanced family support
  • Childcare subsidy
  • Tuition assistance
  • College coaching
  • Short- and long-term disability
  • Voluntary AD&D coverage
  • Voluntary accident coverage
  • Voluntary life insurance
  • Voluntary disability insurance
  • Voluntary long-term care insurance
  • Mobile service & home internet discounts
  • Pet insurance
  • Access to commuter and transit programs

Additional Information:

Job Posted:
December 27, 2025

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Engineer, SRE GenAI

Senior DevOps Engineer (GCP)

Our client is a global UK-based financial services and investment banking organi...
Location
Location
Salary
Salary:
Not provided
n-ix.com Logo
N-iX
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in DevOps, Cloud Engineering, or SRE roles
  • Strong hands-on experience with Google Cloud Platform, including: GKE / Kubernetes, Cloud Run, Cloud Functions, Pub/Sub, Cloud Storage, VPC, IAM, networking, security
  • Expertise in Terraform, Helm, or other IaC tools
  • Experience building CI/CD pipelines (GitHub Actions, GitLab CI, CircleCI, Jenkins, etc.)
  • Strong understanding of containerization and orchestration: Docker, Kubernetes
  • Solid experience with monitoring, observability, and logging stacks
  • Familiarity with networking, load balancing, security hardening, and zero-trust principles
  • Experience supporting production systems in high-availability, distributed environments
  • Strong scripting skills (Python, Bash, or similar)
  • Experience working with agile engineering teams
Job Responsibility
Job Responsibility
  • Design, implement, and maintain cloud infrastructure on Google Cloud (GKE, Cloud Run, Cloud Functions, Pub/Sub, Cloud Storage)
  • Build and optimize CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, or similar)
  • Develop infrastructure-as-code using Terraform or similar tools
  • Set up and maintain container orchestration (Kubernetes, GKE) and automated deployment workflows
  • Implement monitoring, alerting, and observability using tools such as Prometheus, Grafana, ELK/Elastic, Stackdriver, or OpenTelemetry
  • Ensure compliance with security and governance standards across all environments
  • Collaborate closely with engineering teams to ensure scalable, high-performance deployment architectures
  • Support AI/ML and GenAI workloads (Vertex AI pipelines, model hosting, GPU workloads, inference optimization)
  • Manage environment strategies, release pipelines, configuration management, and secrets management
  • Optimize cloud costs and recommend improvements for performance and reliability
What we offer
What we offer
  • Flexible working format - remote, office-based or flexible
  • A competitive salary and good compensation package
  • Personalized career growth
  • Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
  • Active tech communities with regular knowledge sharing
  • Education reimbursement
  • Memorable anniversary presents
  • Corporate events and team buildings
  • Other location-specific benefits
Read More
Arrow Right

Distinguished Technologist, Deep Learning

Joining our HPE Hybrid Cloud team and working as part of our OpsRamp team is a c...
Location
Location
United States , San Jose
Salary
Salary:
164500.00 - 398500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years of relevant experience in the industry delivering technical and business strategy at an advanced/strategist level
  • Master's, or PhD degree in Computer Science, Information Systems, Engineering, or equivalent
  • At least 4 years of hands-on expertise in defining, building, training and / or optimizing foundational deep learning models at scale in PyTorch, HF and other ML frameworks and libraries
  • Experience and/or deep understanding in various deep learning architectures like CNNs, GNNs, Transformers, Reinforcement Learning etc. is a strong advantage
  • Strong hands-on experience/understanding in pre-training, fine-tuning, distilling, aligning open-source large language models and have them complement the in-house foundational models
  • Hands-on experience developing multi-agent applications around a mixture of in-house and open-source models while leveraging latest in RAG and Prompt Engineering tooling techniques
  • Strong customer focus and obsession with improving service availability/performance and user experience/consumption using measurable SRE metrics
  • Must have a track record of working alongside other engineering teams architecting, building, and deploying mission-critical, highly distributed, large-scale SaaS applications
  • Must have strong knowledge of application failure modes, resiliency patterns, and techniques to enable robust, self-healing architecture
  • Effective technical leadership skills to influence diverse groups to move toward common goals/strategies
Job Responsibility
Job Responsibility
  • Oversee build of OpsRamp’s CoPilot for Autonomous Operations for the Hybrid Cloud
  • Understand latest in GenAI/ML for ITOM
  • Understand cloud-native architecture concepts and have knowledge of best practices for high availability, scalability, resilience, performance, and security requirements in the cloud
  • Act as a cross-functional product and technical expert for GenAI within engineering with close working relationships with customers, product management, support, and marketing supporting edge-to-cloud services offering
  • Provides consultation, design input, and feedback for product development and design reviews across multiple organizations and architectures
  • Help transition proof-of-concept implementations into R&D teams to accelerate new product delivery
  • Creates technical content such as designs, specifications, and initial software implementations
  • Guides and mentors less-experienced staff members to set an example of software systems design and development innovation and excellence, helping to grow engineers into more senior technical roles
  • Collect product feedback from field interactions to provide input into Engineering and Product Management to influence product roadmap direction
  • Maintain a high level of knowledge of OpsRamp SaaS product and product road maps, as well as that of the competition and prospective strategic partners
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Distinguished Technologist, Cloud Development (AI/ML)

Joining our HPE Hybrid Cloud team and working as part of our OpsRamp team is a c...
Location
Location
United States , San Jose
Salary
Salary:
164500.00 - 398500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years of relevant experience in the industry delivering technical and business strategy at an advanced/strategist level
  • Master's, or PhD degree in Computer Science, Information Systems, Engineering, or equivalent
  • At least 4 years of hands-on expertise in defining, building, training and / or optimizing foundational deep learning models at scale in PyTorch, HF and other ML frameworks and libraries
  • Experience and/or deep understanding in various deep learning architectures like CNNs, GNNs, Transformers, Reinforcement Learning etc. is a strong advantage
  • Strong hands-on experience/understanding in pre-training, fine-tuning, distilling, aligning open-source large language models and have them complement the in-house foundational models
  • Hands-on experience developing multi-agent applications around a mixture of in-house and open-source models while leveraging latest in RAG and Prompt Engineering tooling techniques
  • Strong customer focus and obsession with improving service availability/performance and user experience/consumption using measurable SRE metrics
  • Must have a track record of working alongside other engineering teams architecting, building, and deploying mission-critical, highly distributed, large-scale SaaS applications
  • Must have strong knowledge of application failure modes, resiliency patterns, and techniques to enable robust, self-healing architecture
  • Effective technical leadership skills to influence diverse groups to move toward common goals/strategies
Job Responsibility
Job Responsibility
  • Lead strategy and innovation across OpsRamp’s Intelligent Observability portfolio
  • Champion HPE OpsRamp’s position with HPE customers and GTM partners externally and HPE internal cross-functional stakeholders
  • Drive technical strategy for emerging GenAI trends across Hybrid Observability and AIOps for cloud-scale modern applications
  • Design and introduce new products to the market
  • Provide consultation, design input, and feedback for product development and design reviews
  • Transition proof-of-concept implementations into R&D teams to accelerate new product delivery
  • Guide and mentor less-experienced staff members.
What we offer
What we offer
  • Health and wellbeing benefits
  • Career development programs
  • Diversity, inclusion, and belonging initiatives.
  • Fulltime
Read More
Arrow Right

AI Test Automation Engineer

At IBM Infrastructure & Technology, we design and operate the systems that keep ...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
ibm.com Logo
IBM Deutschland GmbH
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree
  • 3-5 years of experience in software QA / test engineering
  • Strong hands-on experience in traditional QA automation
  • Proven experience with UI automation using Selenium (mandatory)
  • Solid experience in API automation (REST Assured / JSON)
  • Strong understanding of test pyramid, regression strategies, and CI/CD integration
  • Core Java, Python for test automation (mandatory)
  • Hands-on exposure to ML or GenAI systems testing
  • Solid understanding of: LLMs and prompt engineering
  • RAG architectures
Job Responsibility
Job Responsibility
  • Design, develop, and maintain robust automated test suites for web, API, and backend systems
  • Automate UI tests using Selenium (mandatory) with maintainable page-object or screen-play patterns
  • Build and maintain API automation for REST services ( integration, and regression tests using REST Assured)
  • Own regression, smoke, and sanity suites with clear quality gates
  • Integrate automated tests into CI/CD pipelines with reliable pass/fail signals
  • Drive best practices in test pyramid, test data management, and test stability
  • Analyze flaky tests and improve automation reliability and execution time
  • Test LLM and GenAI features, including prompt behavior, response quality, hallucinations, edge cases, and failure modes
  • Validate RAG pipelines for retrieval accuracy, relevance, grounding, and citation correctness
  • Perform ML regression testing across model versions, prompt changes, and data updates
  • Fulltime
Read More
Arrow Right

Cloud Native GCP Engineer

A senior engineer to join a modern engineering practice across commercial tradin...
Location
Location
United Kingdom , City of London, London
Salary
Salary:
Not provided
whitehallresources.com Logo
Whitehall Resources Ltd
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of software engineering experience, including recent work in a lead engineer capacity
  • Strong back-end development experience using Spring Boot and microservices patterns
  • Hands-on experience with Google Cloud core services (e.g., GKE, Cloud Run, Pub/Sub, BigQuery, IAM)
  • Solid expertise in Terraform and Infrastructure-as-Code practices
  • Strong CI/CD and DevOps toolchain knowledge: GitLab CI, Jenkins, GitHub Actions, ArgoCD, etc.
  • Deep understanding of containerization (Docker) and Kubernetes deployments
  • Familiarity with security, observability (e.g. OpenTelemetry, Stackdriver), and SRE principles
  • Experience in integrating AI/GenAI tools (e.g., Gemini Code Assist, LangChain) into engineering workflows
  • Worked in enterprise transformation programs involving both legacy and modern tech stacks
  • Experience with service mesh (e.g., Istio) and API gateways
Job Responsibility
Job Responsibility
  • Lead the design and implementation of microservices-based architecture to build scalable, secure, and maintainable applications
  • Develop, deploy, and maintain microservices using Spring Boot, Node.js, React
  • Deploy, manage, and scale applications in Google Cloud Platform (GCP) environments
  • Implement cloud-native solutions using containerization technologies like Docker and orchestration tools such as Kubernetes (GKE)
  • Drive automation across the SDLC using CI/CD pipelines, GitOps, and modern DevSecOps workflows (e.g., Jenkins, GitLab CI, GitHub Actions)
  • Use Infrastructure-as-Code (IaC) with Terraform for provisioning and environment consistency
  • Collaborate with architects and product teams to design robust and future-ready solutions
  • Champion engineering excellence, clean code practices, observability, and shift-left security
  • Work with the junior engineers to uplift development standards across squads
  • Lead the path for introducing AI / GenAI tooling for driving adoption across each phase of SDLC (analysis, design, development, testing and deployment)
Read More
Arrow Right
New

Senior Manager Events and Catering

Assists the Assistant Director of Catering by providing support to the operation...
Location
Location
United States
Salary
Salary:
85000.00 - 113000.00 USD / Year
https://www.marriott.com Logo
Marriott Bonvoy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • High school diploma or GED
  • 4 years’ experience in the event management, food and beverage, or related professional area
  • OR Bachelor’s degree from an accredited university in Hotel and Restaurant Management, Hospitality, Business Administration, or related major
  • 2 years’ experience in the event management, food and beverage, or related professional area
Job Responsibility
Job Responsibility
  • Projects supply needs for the department
  • Applies knowledge of all laws as they relate to an event
  • Understands the impact of banquet operations on the overall success of a conference event and manages activities to maximize customer satisfaction
  • Adheres to and reinforces all standards, policies, and procedures
  • Maintains established sanitation levels
  • Manages departmental inventories and maintains equipment
  • Schedules banquet service staff to forecast and service standards, while maximizing profits
  • Assists team in developing lasting relationships with groups to retain business and increase growth
  • Manages department controllable expenses to achieve or exceed budgeted goals
  • Verifies that all banquet event orders (BEO’s) are developed and distributed according to established guidelines
What we offer
What we offer
  • Relocation Assistance Available
  • Fulltime
Read More
Arrow Right
New

Software Engineer

We are looking for a skilled Software Engineer to join our dynamic team in New Y...
Location
Location
United States , New York
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science or a related field from a reputable institution
  • At least 3 years of experience as a software engineer, with a proven track record in full-stack development
  • Proficiency in TypeScript, React.js, and Node.js
  • Hands-on experience with mobile development, particularly using React Native
  • Ability to design and develop performance-sensitive and low-latency systems
  • Strong problem-solving skills and attention to detail
  • Familiarity with startup environments and an entrepreneurial mindset
Job Responsibility
Job Responsibility
  • Develop and maintain full-stack applications using TypeScript, React, and React Native
  • Design, implement, and optimize low-latency systems and performance-sensitive software
  • Collaborate with cross-functional teams to deliver high-quality solutions that meet user needs
  • Contribute to the development of mobile applications with expertise in React Native
  • Write clean, efficient, and scalable code to ensure optimal application functionality
  • Debug and troubleshoot technical issues to maintain system reliability
  • Participate in code reviews and provide constructive feedback to team members
  • Stay up-to-date with emerging technologies and incorporate best practices into development processes
  • Work in an entrepreneurial environment, taking ownership of projects and driving them to completion
  • Engage with product teams to understand user requirements and deliver impactful solutions
What we offer
What we offer
  • medical, vision, dental, and life and disability insurance
  • eligible to enroll in our company 401(k) plan
Read More
Arrow Right
New

Ct technologist

PRN CT Technologist position at Atrium Health Navicent Peach. Need PRN CT techno...
Location
Location
United States , Byron
Salary
Salary:
33.05 - 49.60 USD / Hour
advocatehealth.com Logo
Advocate Health Care
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Graduate of an accredited two-year AMA program in Radiologic or Nuclear Medicine Technology required
  • ARRT certification in Radiology or NMTCB for Nuclear Medicine and advanced registry from the ARRT in CT scanning within one year of hire required
  • BLS required
Job Responsibility
Job Responsibility
  • Examines requests and verifies orders on each assigned patient
  • Properly identify and assist patients while offering a brief explanation of the procedures
  • Interviews patients for a complete medical history
  • Assumes responsibility for the exam from beginning of exam until completion of dictated results
  • Prepares and administers IV contrast according to departmental protocols
  • Evaluates technical quality of images and consults with a Radiologist if needed
  • Performs basic patient care functions
  • Performs CT scanning and assists Radiologist/PA during invasive procedures
  • Is authorized to obtain medication or contrast material as directed for administration by a licensed practitioner
  • Practices principle of radiation safety for self, employees, patients and family members
What we offer
What we offer
  • Paid Time Off programs
  • Health and welfare benefits such as medical, dental, vision, life, and Short- and Long-Term Disability
  • Flexible Spending Accounts for eligible health care and dependent care expenses
  • Family benefits such as adoption assistance and paid parental leave
  • Defined contribution retirement plans with employer match and other financial wellness programs
  • Educational Assistance Program
  • Parttime
Read More
Arrow Right