Staff ML Infrastructure Engineer Job at Darwin Recruitment GmbH (San Francisco)

Engineering Manager, Infrastructure

As an Engineering Manager for the Infrastructure team, you’ll lead the engineers...

Location

Canada; United States

Salary:

195000.00 - 285000.00 USD / Year

Apollo.io

Expiration Date

Until further notice

Requirements

5+ years of hands-on software or infrastructure engineering experience
2+ years of experience leading teams of senior and staff-level engineers in platform, SRE, or infrastructure domains
Proven ability to design and operate large-scale distributed systems in cloud environments (preferably GCP or AWS)
Expertise with Kubernetes, Docker, Terraform, Ubuntu, and CI/CD pipelines
Familiarity with observability tools (Grafana, Prometheus, ELK, Datadog, NewRelic) and performance tuning
Strong grounding in networking, security, and reliability principles
Experience managing infrastructure costs, availability SLAs, and high-throughput systems at scale

Job Responsibility

Lead, coach, and grow a distributed team of high-impact Infrastructure Engineers
Partner with senior engineering leadership on strategic initiatives such as cloud migration, infrastructure scaling, platform reliability, and cost efficiency
Define and implement modern operational excellence practices, including SLOs, error budgets, incident reviews, and performance monitoring
Guide technical decision-making across key areas like Kubernetes, GCP, observability, networking, CI/CD, and IaC (Terraform, Ansible)
Collaborate with AI, Data, and Product Engineering teams to ensure infrastructure scalability for ML and AI-native workloads
Run effective 1:1s, career development conversations, and quarterly performance reviews
Support recruiting efforts to attract top engineering talent across time zones

What we offer

Equity
Company bonus or sales commissions/bonuses
401(k) plan
At least 10 paid holidays per year
Flex PTO
Parental leave
Employee assistance program and wellbeing benefits
Global travel coverage
Life/AD&D/STD/LTD insurance
FSA/HSA and medical, dental, and vision benefits

Fulltime

Staff Platform Engineer

Join our dynamic team as a Compute Platform Engineer and play a pivotal role in ...

Location

Canada , Vancouver

Salary:

190000.00 - 240000.00 CAD / Year

Inworld AI

Expiration Date

Until further notice

Requirements

7 years of experience in software engineering
5 years of experience with infrastructure-as-code
Proficiency in managing Kubernetes clusters and applications, including creating Kustomize manifests/Helm charts for new applications
Experience in creating and maintaining CI/CD pipelines for both applications and infrastructure deployments (using tools like Terraform/Terragrunt, ArgoCD, GitHub Actions, Ansible, etc.)
Deep knowledge of at least one major cloud provider (Google Cloud Platform, Microsoft Azure, Oracle Cloud)
Proficient in at least one backend programming/scripting languages such as Golang, Python, and Bash

Job Responsibility

Work closely with backend and ML engineering teams to design, deploy, and maintain reliable, high-performance, and secure cloud infrastructure for our AI engine and Studio
Facilitate a "you build it, you run it" culture by providing the necessary tools and processes for monitoring the reliability, availability, and performance of services
Manage CI/CD pipelines to ensure smooth and efficient code integration and deployment
Identify and implement opportunities to enhance engineering speed and efficiency
Conduct root cause analysis to identify critical issues and develop automated solutions to prevent recurrence
Develop and share best practices to improve automation and efficiency across our engineering teams

What we offer

bonus
equity
benefits

Fulltime

Member of Technical Staff, Cloud Infrastructure

As a Software Engineer on our Cloud Infrastructure team, you'll be at the forefr...

Location

United States , New York, NY; San Mateo, CA; Redwood City, CA

Salary:

175000.00 - 220000.00 USD / Year

Fireworks AI

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience)
5+ years of experience designing and building backend infrastructure in cloud environments (e.g., AWS, GCP, Azure)
Proven experience in ML infrastructure and tooling (e.g., PyTorch, TensorFlow, Vertex AI, SageMaker, Kubernetes, etc.)
Strong software development skills in languages like Python, or C++
Deep understanding of distributed systems fundamentals: scheduling, orchestration, storage, networking, and compute optimization

Job Responsibility

Architect and build scalable, resilient, and high-performance backend infrastructure to support distributed training, inference, and data processing pipelines
Lead technical design discussions, mentor other engineers, and establish best practices for building and operating large-scale ML infrastructure
Design and implement core backend services (e.g., job schedulers, resource managers, autoscalers, model serving layers) with a focus on efficiency and low latency
Drive infrastructure optimization initiatives, including compute cost reduction, storage lifecycle management, and network performance tuning
Collaborate cross-functionally with ML, DevOps, and product teams to translate research and product needs into robust infrastructure solutions
Continuously evaluate and integrate cloud-native and open-source technologies (e.g., Kubernetes, Ray, Kubeflow, MLFlow) to enhance our platform’s capabilities and reliability
Own end-to-end systems from design to deployment and observability, with a strong emphasis on reliability, fault tolerance, and operational excellence

What we offer

Meaningful equity in a fast-growing startup
Competitive salary
Comprehensive benefits package

Fulltime

Staff Platform Engineer

Join our dynamic team as a Compute Platform Engineer and play a pivotal role in ...

Location

United States , Mountain View, California

Salary:

180000.00 - 280000.00 USD / Year

Inworld AI

Expiration Date

Until further notice

Requirements

7 years of experience in software engineering
5 years of experience with infrastructure-as-code
Proficiency in managing Kubernetes clusters and applications, including creating Kustomize manifests/Helm charts for new applications
Experience in creating and maintaining CI/CD pipelines for both applications and infrastructure deployments (using tools like Terraform/Terragrunt, ArgoCD, GitHub Actions, Ansible, etc.)
Deep knowledge of at least one major cloud provider (Google Cloud Platform, Microsoft Azure, Oracle Cloud)
Proficient in at least one backend programming/scripting languages such as Golang, Python, and Bash
Candidates must be based in the SF Bay Area or willing to relocate (you will be working on-site in our South Bay office a few days a week)

Job Responsibility

Work closely with backend and ML engineering teams to design, deploy, and maintain reliable, high-performance, and secure cloud infrastructure for our AI engine and Studio
Facilitate a "you build it, you run it" culture by providing the necessary tools and processes for monitoring the reliability, availability, and performance of services
Manage CI/CD pipelines to ensure smooth and efficient code integration and deployment
Identify and implement opportunities to enhance engineering speed and efficiency
Conduct root cause analysis to identify critical issues and develop automated solutions to prevent recurrence
Develop and share best practices to improve automation and efficiency across our engineering teams

What we offer

equity and benefits

Fulltime

Staff Backend Engineer

Kalepa is looking for a Staff Backend Engineer to work on its AI Copilot platfor...

Location

Salary:

145000.00 - 185000.00 USD / Year

Kalepa

Expiration Date

Until further notice

Requirements

8+ years of relevant software engineering experience
Excellent development skills including design, debugging and problem solving
Bachelors or master's degree in computer science or a related field
Experience with Python3 or other OO languages (Java, C++, C#, etc.)
Experience with AWS (EC2, Lambda, etc.) and serverless technologies
Experience with relational databases, preference for PostgreSQL
Experience working on distributed systems creating scalable, fault-tolerant infrastructure
Experience building data driven microservices leveraging RESTful API's
Experience with tools such as Docker, Git, GitHub, Flask, NumPy, Pandas

Job Responsibility

Work on advanced systems including NLP, firmographic data, entity resolution
Solve problems at the intersection of large and performant data pipelines, distributed systems, machine learning models, and robust infrastructure
Collaborate with a global team of full-stack, data, ML, and DevOps engineers
Build scalable and reliable backend solutions

What we offer

Competitive salary (based on experience level)
Significant equity options package
20 days of PTO a year
Global team offsites
Healthy living/gym stipend
Mobile phone bill stipend
Continuing education credits

Fulltime

Staff Software Engineer, Backend

The Staff Engineer will work closely with AI/ML engineers, product managers, app...

Location

United States , NYC

Salary:

160000.00 - 190000.00 USD / Year

Conductor

Expiration Date

Until further notice

Requirements

Completed studies in Computer Science, Mathematics, engineering or a related field or equivalent professional experience
8+ years of experience in software development, with experience in product-driven companies
Strong expertise in system design, distributed computing, and scalable architecture patterns for handling large datasets and high-throughput applications
Proficiency in multiple programming languages with strong Python coding skills. Experience with Java is highly valued
Strong database experience including both SQL and NoSQL systems, with knowledge of data modeling and optimization techniques
Experience with AI/ML technologies including LLMs, vector databases (e.g., Milvus), embeddings, and ML frameworks
Knowledge of MLOps practices, model deployment, and AI system integration in production environments
Experience working across the full software development lifecycle including CI/CD, monitoring, testing, and production deployment
Proven track record of technical leadership, mentoring engineers, and driving engineering excellence within teams
Up-to-date with rapidly-evolving technologies and demonstrated ability to evaluate and adopt new tools and frameworks

Job Responsibility

Lead the technical architecture, design, and implementation of large-scale distributed systems and data platforms to support customer needs and business growth
Oversee the planning, execution, and successful delivery of complex engineering projects, ensuring adherence to engineering best practices and quality standards
Design and build scalable, high-performance backend systems and APIs that handle millions of requests and large datasets efficiently
Architect robust data processing pipelines and ETL workflows using modern cloud technologies and distributed computing frameworks
Drive technical decision-making across the engineering organization, evaluating trade-offs and establishing engineering standards and practices
Lead cross-functional collaboration with product, AI/ML engineering, data engineering, and infrastructure teams to deliver comprehensive solutions
Build and maintain CI/CD pipelines, monitoring systems, and deployment automation to ensure reliable software delivery
Implement AI/ML capabilities including LLM integration, vector databases, and intelligent content processing workflows
Mentor senior and junior engineers, fostering technical excellence and knowledge sharing within the engineering organization

What we offer

100% covered employee medical plan
a dental & vision plans
401(k) with employer contribution
an unlimited vacation policy
10 sick days
short-term disability
long-term disability
generous paid parental leave
employee assistance program
flexible savings accounts

Fulltime

Staff Software Engineer

As a Staff Forward Deployed Engineer (FDE) at Invisible, you'll lead high-impact...

Location

United States , Austin; New York; San Francisco Bay Area; Washington DC–Baltimore

Salary:

213000.00 - 300000.00 USD / Year

Invisible Technologies

Expiration Date

Until further notice

Requirements

8+ years of software engineering experience, including significant time spent building data, ML, or backend systems
Deep proficiency in Python with hands-on experience using Hugging Face, LangChain, OpenAI, Pinecone, and related ecosystems
Skilled in full-stack and API-based deployment patterns, including Docker, FastAPI, Kubernetes, and cloud environments (GCP, AWS)
Experienced with workflow orchestration libraries, pub/sub systems (Kafka), and schema governance
Expertise in data governance and operations, including Unity Catalog and policy management, cluster/job orchestration, data contracts and quality enforcement, Delta/ETL pipelines, and replay processes
Strong product and system design instincts — you understand business needs and how to translate them into technical architecture
Experience building usable systems from messy data and ambiguous requirements
Excellent communication and client-facing skills
you’ve led conversations with technical and non-technical stakeholders alike
Proven experience owning projects from scoping through deployment in ambiguous, high-stakes environments

Job Responsibility

Partner with delivery and executive stakeholders to scope, design, and lead implementation of AI-driven solutions
Identify transformational opportunities in messy, ambiguous workflows and turn them into repeatable systems
Lead architecture design and trade-off discussions across performance, scalability, cost, and reliability
Own projects from first discovery call through full deployment — including client-facing delivery, internal coordination, and post-launch iteration
Build shared infrastructure, reusable components, and internal playbooks to level-up the team
Coach and mentor mid-level engineers and help shape the culture of forward-deployed AI engineering at Invisible

What we offer

bonus
equity
benefits

Fulltime

Staff Machine Learning Engineer

Join PagerDuty as a Staff Machine Learning Engineer to tackle complex problems, ...

Location

Canada , Toronto

Salary:

156000.00 - 232000.00 CAD / Year

PagerDuty

Expiration Date

Until further notice

Requirements

8+ years of experience building, designing, and evolving data architecture for large-scale systems
Excellent communication skills
Experience working with Product teams, ensuring and driving a timely delivery
Have a deep understanding of the trade-offs to be considered when designing and delivering machine learning solutions to production
Experience leading cross-team architecture discussions, building technical prototypes, and driving the adoption of best practices across diverse teams
Demonstrated experience with data engineering processes, working with unstructured data and cloud-based data infrastructures
Passionate about ML engineering and interested in driving discussions with stakeholders and executives

Job Responsibility

Build and improve the capabilities of the data platform that enable and accelerate the production of ML/AI-based solutions
Drive and define standards for AI/ML across the organization
Provide guidance, technical leadership, and mentoring to other members of the team
Mentor junior members and participate in scaling up the existing team
Proactively recommend improvements and new approaches addressing potential systemic pain points and technical debt
Anticipate technical demands on the data platform based on the organization’s roadmap and systematically drive the evolution of the architecture toward those ends
Develop a long-term plan for ML/AI investments

What we offer

Competitive salary
Comprehensive benefits package from day one
Flexible work arrangements
Company equity
ESPP (Employee Stock Purchase Program)
Retirement or pension plan
Generous paid vacation time
Paid holidays and sick leave
Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent

Fulltime

Staff ML Infrastructure Engineer

Darwin Recruitment GmbH

Location:
United States , San Francisco

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
January 05, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Staff ML Infrastructure Engineer

Engineering Manager, Infrastructure

Staff Platform Engineer

Member of Technical Staff, Cloud Infrastructure

Staff Platform Engineer

Staff Backend Engineer

Staff Software Engineer, Backend

Staff Software Engineer

Staff Machine Learning Engineer

Staff ML Infrastructure Engineer

Darwin Recruitment GmbH

Location:United States , San Francisco

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:January 05, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Staff ML Infrastructure Engineer

Engineering Manager, Infrastructure

Staff Platform Engineer

Member of Technical Staff, Cloud Infrastructure

Staff Platform Engineer

Staff Backend Engineer

Staff Software Engineer, Backend

Staff Software Engineer

Staff Machine Learning Engineer

Location:
United States , San Francisco

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
January 05, 2026