LLM Inference Frameworks and Optimization Engineer Job at Together AI (San Francisco)

Director of AI Engineering

We are entering a hyper-growth phase of AI innovation and are hiring a Director ...

Location

Canada; United States

Salary:

300000.00 - 450000.00 USD / Year

Apollo.io

Expiration Date

Until further notice

Requirements

10–15+ years in software engineering, with significant leadership experience owning AI/ML or applied LLM systems at scale
Proven history shipping LLM-powered features, agentic workflows, or AI assistants used by real customers in production
Deep understanding of LLM orchestration frameworks (LangChain, LlamaIndex), RAG pipelines, vector search, embeddings, and prompt engineering
Expert in backend & distributed systems (Python strongly preferred) and cloud infrastructure (AWS/GCP)
Strong experience with telemetry, observability, and cost-aware real-time inference optimizations
Demonstrated ability to lead senior engineers, define technical roadmaps, and deliver outcomes aligned to business metrics
Experience building or scaling teams working on experimentation, optimization, personalization, or ML-powered growth systems
Exceptional ability to simplify complex problems, set clear standards, and drive alignment across Product, Data, Design, and Engineering
Strong product sense, ability to weigh novelty vs. impact, focus on user value, and prioritize speed with guardrails
Fluent in integrating AI tools into engineering workflows for code generation, debugging, delivery velocity, and operational efficiency

Job Responsibility

Define the multi-year technical vision for Apollo’s AI stack, spanning agents, orchestration, inference, retrieval, and platformization
Prioritize high-impact AI investments by partnering with Product, Design, Research, and Data leaders to align engineering outcomes with business goals
Establish technical standards, evaluation criteria, and success metrics for every AI-powered feature shipped
Lead the architecture and deployment of long-horizon autonomous agents, multi-agent workflows, and API-driven orchestration frameworks
Build reusable, scalable agentic components that power GTM workflows like research, enrichment, sequencing, lead scoring, routing, and personalization
Own the evolution of Apollo’s internal LLM platform for high-scale, low-latency, cost-optimized inference
Oversee model-driven experiences for natural-language interfaces, RAG pipelines, semantic search, personalized recommendations, and email intelligence
Partner with Product & Design to build intuitive conversational UX that hides underlying complexity while elevating user productivity
Implement rigorous evaluation frameworks, including offline benchmarking, human-in-the-loop review, and online A/B experimentation
Ensure robust observability, monitoring, and safety guardrails for all AI systems in production

What we offer

Equity
Company bonus or sales commissions/bonuses
401(k) plan
At least 10 paid holidays per year
Flex PTO
Parental leave
Employee assistance program and wellbeing benefits
Global travel coverage
Life/AD&D/STD/LTD insurance
FSA/HSA

Fulltime

Senior Product Manager, AI Agents

This role owns AI research, messaging, and context—spanning both the user experi...

Location

United States

Salary:

187000.00 - 250000.00 USD / Year

Apollo.io

Expiration Date

Until further notice

Requirements

5+ years in product management
2+ years experience launching AI/ML new products and scaling existing products
Track record of shipping AI features that drove measurable business outcomes
Experience with LLM-powered applications, prompt engineering, evaluation frameworks, and model selection tradeoffs
Comfortable working in Python/SQL to analyze data, prototype prompts, and evaluate outputs
Understanding of LLM architectures, RAG pipelines, agent frameworks, and inference optimization
Obsession with quality over speed
GTM or sales tech experience (strongly preferred)
Familiarity with sales workflows, prospecting tools, or CRM systems
Understanding of why sales teams are skeptical of AI tools and what it takes to earn their trust

Job Responsibility

Develop and execute a strategic roadmap for AI research, messaging, and context capabilities
Enhance Apollo's AI research agents to surface actionable insights from the web
Define how AI understands each user's business
Own AI-powered messaging tools that create personalized, context-aware emails at scale
Build and scale evaluation infrastructure across accuracy, relevance, clarity, and tone
Partner with engineering, design, prompt writers, and sales to deliver cohesive AI experiences

What we offer

Equity
Company bonus or sales commissions/bonuses
401(k) plan
At least 10 paid holidays per year
Flex PTO
Parental leave
Employee assistance program and wellbeing benefits
Global travel coverage
Life/AD&D/STD/LTD insurance
FSA/HSA and medical, dental, and vision benefits

Fulltime

New

Staff Software Engineer - AI/ML Infra

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...

Location

United States , Palo Alto

Salary:

90000.00 - 300000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
3+ years of hands-on experience with machine learning infrastructure and deployment at scale
2+ years of experience working with Large Language Models and transformer architectures
Proficient in Python
strong skills in Go, Rust, or Java preferred
Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)

Job Responsibility

Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
Design, implement, and maintain feature stores for ML model training and inference pipelines
Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

New

Staff Software Engineer - AI/ML Platform

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...

Location

United States , Chevy Chase; New York City; Palo Alto

Salary:

115000.00 - 300000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
3+ years of hands-on experience with machine learning infrastructure and deployment at scale
2+ years of experience working with Large Language Models and transformer architectures
Proficient in Python
strong skills in Go, Rust, or Java preferred
Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)

Job Responsibility

Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
Design, implement, and maintain feature stores for ML model training and inference pipelines
Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

New

Staff Software Engineer - AI/ML Infra

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...

Location

United States , Chevy Chase; New York City; Palo Alto

Salary:

115000.00 - 300000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
3+ years of hands-on experience with machine learning infrastructure and deployment at scale
2+ years of experience working with Large Language Models and transformer architectures
Proficient in Python
strong skills in Go, Rust, or Java preferred
Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)

Job Responsibility

Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
Design, implement, and maintain feature stores for ML model training and inference pipelines
Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

New

Member of Technical Staff - Inference

Prime Intellect is building the open superintelligence stack - from frontier age...

Location

United States , San Francisco

Salary:

Not provided

Prime Intellect

Expiration Date

Until further notice

Requirements

3+ years building and running large‑scale ML/LLM services with clear latency/availability SLOs
Hands‑on with at least one of vLLM, SGLang, TensorRT‑LLM
Familiarity with distributed and disaggregated serving infrastructure such as NVIDIA Dynamo
Deep understanding of prefill vs. decode, KV‑cache behavior, batching, sampling, speculative decoding, parallelism strategies
Comfortable debugging CUDA/NCCL, drivers/kernels, containers, service mesh/networking, and storage, owning incidents end‑to‑end
Python: Systems tooling and backend services
PyTorch: LLM Inference engine development and integration, deployment readiness
AWS/GCP service experience, cloud deployment patterns
Running infrastructure at scale with containers on Kubernetes
Architecture, CUDA runtime, NCCL, InfiniBand

Job Responsibility

Build a multi-tenant LLM serving platform that operates across our cloud GPU fleets
Design placement and scheduling algorithms for heterogeneous accelerators
Implement multi‑region/zone failover and traffic shifting for resilience and cost control
Build autoscaling, routing, and load balancing to meet throughput/latency SLOs
Optimize model distribution and cold-start times across clusters
Integrate and contribute to LLM inference frameworks such as vLLM, SGLang, TensorRT‑LLM
Optimize configurations for tensor/pipeline/expert parallelism, prefix caching, memory management and other axes for maximum performance
Profile kernels, memory bandwidth and transport
apply techniques such as quantization and speculative decoding
Develop reproducible performance suites (latency, throughput, context length, batch size, precision)

What we offer

Competitive compensation with significant equity incentives
Flexible work arrangement (remote or San Francisco office)
Full visa sponsorship and relocation support
Professional development budget
Regular team off-sites and conference attendance
Opportunity to shape decentralized AI and RL at Prime Intellect

Fulltime

AI Software Engineer - NLP/LLM

At Moody's, we unite the brightest minds to turn today’s risks into tomorrow’s o...

Location

United States , New York

Salary:

159300.00 - 230850.00 USD / Year

Moody's

Expiration Date

Until further notice

Requirements

5+ years of demonstrated experience building production-grade machine learning systems with measurable impacts
expertise in NLP and search and recommendation systems is preferred
Hands-on experience with large language model (LLM) applications and AI agents, including retrieval-augmented generation, prompt optimization, fine-tuning, agent design, and evaluation methodologies
familiarity with prompt optimization frameworks like DSPy is preferred
Deep expertise in machine learning models and systems design, including classic models (e.g., XGBoost), modern deep learning and graph machine learning architectures (e.g., transformers-based models, graph neural networks (GNN)), and reinforcement learning systems
Proven ability to take models and agents from research to production, including optimization for latency and cost, implementation of monitoring and tracing, and development of reusable platforms or frameworks
Strong technical leadership and mentorship skills, with a track record of growing engineers, improving team velocity through automation, documentation, and tooling, and influencing architectural decisions without direct authority
Excellent communication and strategic thinking abilities, capable of aligning technical decisions with business outcomes, navigating ambiguity, and driving cross-functional collaboration
Bachelor’s degree or higher in Computer Science, Engineering, or a related field

Job Responsibility

Design and deploy end to end AI and machine learning solutions including machine learning and graph-based models, natural language processing (NLP) models, and large language model (LLM) based AI agents
Build robust pipelines for data ingestion, feature engineering, model training, validation, and real-time or batch inference
Develop and integrate large language model (LLM) applications using techniques such as fine-tuning, retrieval-augmented generation, and reinforcement learning
Build autonomous agents capable of multi-step reasoning and tool use in production environments
Lead the full model and agent development lifecycle, from problem definition and data exploration through experimentation, implementation, deployment, and monitoring
Ensure solutions are scalable, reliable, and aligned with business goals
Advocate and implement machine learning operations (MLOps) best practices including data monitoring and tracing, error analysis, automated retraining, model and prompt versioning, business metrics monitoring, and incident response
Collaborate across disciplines and provide technical leadership, working with product managers, engineers, and researchers to deliver impactful solutions
Mentor team members, lead design reviews, and promote best practices in AI and machine learning systems development

What we offer

medical
dental
vision
parental leave
paid time off
a 401(k) plan with employee and company contribution opportunities
life, disability, and accident insurance
a discounted employee stock purchase plan
tuition reimbursement

Fulltime

New

AI Inference Engineer

We are looking for an AI Inference engineer to join our growing team. Our curren...

Location

United States , San Francisco; Palo Alto; New York City

Salary:

210000.00 - 385000.00 USD / Year

Perplexity

Expiration Date

Until further notice

Requirements

Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
Understanding of GPU architectures or experience with GPU kernel programming using CUDA

Job Responsibility

Develop APIs for AI inference that will be used by both internal and external customers
Benchmark and address bottlenecks throughout our inference stack
Improve the reliability and observability of our systems and respond to system outages
Explore novel research and implement LLM inference optimizations

What we offer

equity
health
dental
vision
retirement
fitness
commuter and dependent care accounts

Fulltime

LLM Inference Frameworks and Optimization Engineer

Together AI

Location:
United States , San Francisco

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
February 18, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for LLM Inference Frameworks and Optimization Engineer

Director of AI Engineering

Senior Product Manager, AI Agents

Staff Software Engineer - AI/ML Infra

Staff Software Engineer - AI/ML Platform

Staff Software Engineer - AI/ML Infra

Member of Technical Staff - Inference

AI Software Engineer - NLP/LLM

AI Inference Engineer

LLM Inference Frameworks and Optimization Engineer

Together AI

Location:United States , San Francisco

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:February 18, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for LLM Inference Frameworks and Optimization Engineer

Director of AI Engineering

Senior Product Manager, AI Agents

Staff Software Engineer - AI/ML Infra

Staff Software Engineer - AI/ML Platform

Staff Software Engineer - AI/ML Infra

Member of Technical Staff - Inference

AI Software Engineer - NLP/LLM

AI Inference Engineer

Location:
United States , San Francisco

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
February 18, 2026