Architecture Intern - Inference Job at Etched (San Jose)

AI Inference Engineer

We are looking for an AI Inference engineer to join our growing team. Our curren...

Location

United States , San Francisco; Palo Alto; New York City

Salary:

210000.00 - 385000.00 USD / Year

Perplexity

Expiration Date

Until further notice

Requirements

Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
Understanding of GPU architectures or experience with GPU kernel programming using CUDA

Job Responsibility

Develop APIs for AI inference that will be used by both internal and external customers
Benchmark and address bottlenecks throughout our inference stack
Improve the reliability and observability of our systems and respond to system outages
Explore novel research and implement LLM inference optimizations

What we offer

equity
health
dental
vision
retirement
fitness
commuter and dependent care accounts

Fulltime

AI Inference Engineer

We are looking for an AI Inference engineer to join our growing team. Our curren...

Location

United Kingdom , London

Salary:

Not provided

Perplexity

Expiration Date

Until further notice

Requirements

Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
Understanding of GPU architectures or experience with GPU kernel programming using CUDA

Job Responsibility

Develop APIs for AI inference that will be used by both internal and external customers
Benchmark and address bottlenecks throughout our inference stack
Improve the reliability and observability of our systems and respond to system outages
Explore novel research and implement LLM inference optimizations

What we offer

Equity may be part of the total compensation package

Fulltime

Engineering Manager - Inference

We are looking for an Inference Engineering Manager to lead our AI Inference tea...

Location

United States , San Francisco

Salary:

300000.00 - 385000.00 USD / Year

Perplexity

Expiration Date

Until further notice

Requirements

5+ years of engineering experience with 2+ years in a technical leadership or management role
Deep experience with ML systems and inference frameworks (PyTorch, TensorFlow, ONNX, TensorRT, vLLM)
Strong understanding of LLM architecture: Multi-Head Attention, Multi/Grouped-Query Attention, and common layers
Experience with inference optimizations: batching, quantization, kernel fusion, FlashAttention
Familiarity with GPU characteristics, roofline models, and performance analysis
Experience deploying reliable, distributed, real-time systems at scale
Track record of building and leading high-performing engineering teams
Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism
Strong technical communication and cross-functional collaboration skills

Job Responsibility

Lead and grow a high-performing team of AI inference engineers
Develop APIs for AI inference used by both internal and external customers
Architect and scale our inference infrastructure for reliability and efficiency
Benchmark and eliminate bottlenecks throughout our inference stack
Drive large sparse/MoE model inference at rack scale, including sharding strategies for massive models
Push the frontier with building inference systems to support sparse attention, disaggregated pre-fill/decoding serving, etc.
Improve the reliability and observability of our systems and lead incident response
Own technical decisions around batching, throughput, latency, and GPU utilization
Partner with ML research teams on model optimization and deployment
Recruit, mentor, and develop engineering talent

What we offer

Equity
Health
Dental
Vision
Retirement
Fitness
Commuter and dependent care accounts

Fulltime

ML Engineer - Inference Serving

Luma’s mission is to build multimodal AI to expand human imagination and capabil...

Location

United States; United Kingdom , Palo Alto; London

Salary:

187500.00 - 395000.00 USD / Year

Luma AI

Expiration Date

Until further notice

Requirements

Strong Python and system architecture skills
Experience with model deployment using PyTorch, Huggingface, vLLM, SGLang, tensorRT-LLM, or similar
Experience with queues, scheduling, traffic-control, fleet management at scale
Experience with Linux, Docker, and Kubernetes
Python
Redis
S3-compatible Storage
Model serving (one of: PyTorch, vLLM, SGLang, Huggingface)
Understanding of large-scale orchestration, deployment, scheduling (via Kubernetes or similar)

Job Responsibility

Ship new model architectures by integrating them into our inference engine
Collaborate closely across research, engineering and infrastructure to streamline and optimize model efficiency and deployments
Build internal tooling to measure, profile, and track the lifetime of inference jobs and workflows
Automate, test and maintain our inference services to ensure maximum uptime and reliability
Optimize deployment workflows to scale across thousands of machines
Manage and optimize our inference workloads across different clusters & hardware providers
Build sophisticated scheduling systems to optimally leverage our expensive GPU resources while meeting internal SLOs
Build and maintain CI/CD pipelines for processing/optimizing model checkpoints, platform components, and SDKs for internal teams to integrate into our products/internal tooling

Fulltime

AI / ML Engineer, Software Engineering

iCapital is seeking an experienced and forward-thinking AI/ML Engineer Vice Pres...

Location

United States , New York

Salary:

180000.00 - 220000.00 USD / Year

iCapital Network

Expiration Date

Until further notice

Requirements

8+ years of experience in software engineering, with at least 2+ years focused on AI/ML systems
Proven experience in building and deploying ML models in production environments
Hands-on experience with AI agent frameworks (e.g., LangChain, Semantic Kernel, AutoGen, or custom-built systems)
Strong understanding of the ML lifecycle, including data pipelines, model training, evaluation, deployment, and monitoring
Familiar with MLOps tools such as MLflow, Kubeflow, or SageMaker
Deep understanding of LLM orchestration, prompt engineering, tool use, and memory architectures
Familiar with various LLM inference engines such as vLLM or SGLang
Experience in integrating agents with APIs, databases, and external systems
Familiar with retrieval-augmented generation (RAG), vector databases, and knowledge graphs
Experience deploying AI systems in cloud environments (AWS, GCP, Azure) and utilizing containerization tools (Docker, Kubernetes)

Job Responsibility

Design, build, and optimize scalable AI/ML infrastructure and services powering intelligent features across our platform
Lead the development of AI agents capable of autonomous decision-making, task execution, and multi-step reasoning across internal and customer-facing applications
Architect and implement modular agent frameworks by integrating tools, APIs, and memory systems for dynamic and context-aware behavior
Collaborate with product, data, and infrastructure teams to embed AI capabilities into production systems
Drive the architecture and development of ML pipelines, model serving frameworks, and real-time inference systems
Evaluate and integrate state-of-the-art AI tools and frameworks to accelerate development and deployment
Provide technical mentorship and guidance to engineers, contributing to team growth and best practices
Partner with Data Science teams to operationalize models, ensuring a smooth transition from experimentation to production
Contribute to technical roadmaps and help define long-term AI/ML platform and agent strategy
Optimize agent performance for latency, reliability, and safety in production environments

What we offer

Equity for all full-time employees
Annual performance bonus
Employer matched retirement plan
Generously subsidized healthcare with 100% employer paid dental, vision, telemedicine, and virtual mental health counseling
Parental leave
Unlimited paid time off (PTO)

Fulltime

Cloud Solution Architect - AI/ML

We are looking for a Cloud Solution Architect (CSA) who is passionate about driv...

Location

United Kingdom , London

Salary:

Not provided

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science, Information Technology, Engineering, Business, or related field AND experience in cloud/infrastructure technologies, information technology (IT) consulting/support, systems administration, network operations, software development/support, technology solutions, practice development, architecture, and/or consulting
OR equivalent experience
This role requires UK Security Clearance, therefore candidates will need to either have existing security clearance or meet the minimum criteria to apply for security clearance.
Strong years of experience working in a customer-facing role (e.g., internal and/or external).
Strong years of experience working on technical projects
Technical Certification in Cloud (e.g., Azure, Amazon Web Services, Google, security certifications)
Experience and expertise in one or more of the following areas: Azure AI Foundry (Models, Agent Service, Semantic Kernel, Search, ML, SDK)
AppPlat/Containers/Serverless (App Service, AKS, ACA, ARO, Functions)
DevOps (CI/CD, Azure DevOps, DevSecOps)
GitHub (Copilot, Enterprise, Adv Security, Actions, Codespaces)

Job Responsibility

Understand customers’ Business and IT priorities and translate them into AI, ML, and cloud engineering architectures, spanning platform engineering, cloud‑native apps, inference pipelines, data workflows, and low‑code extensibility.
Act as a highly technical partner, leading customers through architecture reviews, proofs‑of‑concept, and MVP builds, including environment setup, model orchestration, retrieval pipelines, CI/CD automation, and deployment readiness.
Implement secure, performant solutions that meet production standards across performance, reliability, maintainability, observability, and Responsible AI requirements.
Deliver engineering‑focused workshops, deep‑dives, and readiness sessions
guide customers on ML engineering patterns, prompt engineering, data preparation, RAG design, deployment pipelines, and cloud development best practices.
Accelerate customer success by diagnosing and resolving technical blockers in application development, ML workflows, model deployment, and cloud infrastructure, driving adoption of Azure AI, Foundry, and cloud services.Use engineering knowledge to propose architecture improvements, performance optimisations, and scalable solution patterns.
Stay current with the latest Azure AI, OpenAI, Foundry, HuggingFace, GitHub, and cloud-native capabilities
be a practitioner in Python, .NET, JavaScript/Node, or equivalent enterprise stacks.Contribute reusable assets, patterns, sample architectures, code accelerators, and internal IP to scale technical impact across the CSA community.

Fulltime

Lead Machine Learning Engineer

Machine Learning Engineers specializing in Inference Optimization focus on maxim...

Location

Singapore , Singapore

Salary:

Not provided

Thoughtworks

Expiration Date

Until further notice

Requirements

Deep practical expertise in model and runtime optimization techniques (quantization, pruning, distillation, batching, caching)
Proven experience optimizing inference workloads using frameworks such as vLLM, NVIDIA Triton/Dynamo
Strong proficiency in deep learning frameworks (e.g. PyTorch, TensorFlow) with production deployment experience
Ability to diagnose and optimize performance using profiling tools (e.g. Nsight, PyTorch/TensorFlow profilers)
Solid understanding of GPU and accelerator architectures, and experience tuning workloads for cost and performance efficiency
Experience designing and benchmarking scalable inference systems across heterogeneous environments (GPU clusters, serverless, edge)
Familiarity with observability stacks, telemetry, and cost instrumentation for AI workloads
Demonstrated ability to lead small-to-medium engineering teams or technical workstreams
Skilled at balancing hands-on delivery with architectural oversight and mentorship
Strong communication and stakeholder engagement skills and are able to connect low-level optimizations with business impact

Job Responsibility

Lead the design and implementation of advanced model optimization pipelines, including quantization, pruning, and distillation
Architect and tune inference runtimes and serving frameworks to achieve optimal performance across deployments
Guide teams in implementing high-throughput serving strategies (continuous batching, KV caching, speculative decoding, asynchronous scheduling)
Develop benchmarks and performance dashboards to measure and communicate system-level efficiency improvements (throughput, latency, GPU utilization, cost)
Evaluate trade-offs across accuracy, performance, and cost, and design architectures to meet target SLAs across varied hardware environments (cloud, on-prem, edge)
Collaborate with infrastructure, MLOps, and product teams to embed inference optimization into production workflows and platform designs
Provide technical leadership and mentorship to engineers, fostering a culture of experimentation, rigor, and continuous performance improvement
Contribute to the development of internal frameworks, reference architectures, and playbooks for scalable and cost-efficient inference
Engage with clients to translate optimization outcomes into business value and articulate the ROI of technical improvements

What we offer

Learning & Development
Interactive tools
Numerous development programs
Teammates who want to help you grow
Empowering our employees in their career journeys

Fulltime

Lead Product Manager, AI Storage Solutions

As a Lead Product Manager, AI Storage Solutions you will define the strategic vi...

Location

United States , Milpitas

Salary:

194425.00 - 275414.00 USD / Year

Sandisk

Expiration Date

May 06, 2026

Requirements

Bachelor’s degree in Electrical Engineering, Computer Science, Engineering, or related field
Minimum of 8+ years of experience in product management
Minimum of 2 years experience in AI Architectures for datacenters or on-device
Strong desire to take ownership of the full product lifecycle
Proven track record of managing and launching successful emerging technology products
Deep understanding of flash memory technology and AI Storage Solutions, 3D stacking, TSV architectures. NAND-based high bandwidth architectures
Any knowledge of AI software stack will be a added advantage

Job Responsibility

Define the strategic vision, roadmap, and execution plan for next‑generation memory solutions that enable cutting‑edge AI, Machine Learning, and High‑Performance Computing ecosystems
Serve as the connective tissue across engineering, marketing, operations, and hyperscale customers—driving competitive differentiation and business growth for high‑performance memory products
Own the multi‑year product roadmap, aligning with technology inflection points such as AI Storage Solutions, and emerging AI Storage Solutions architectures
Translate market dynamics, customer signals, and competitive insights into product requirements (MRD/PRD)
Lead deep technical engagements with hyperscalers and chipmakers (e.g., NVIDIA, AMD), turning customer performance, latency, and capacity needs into engineering deliverables
Position product solutions for AI inference, KV‑cache, and memory‑centric architectures informed by industry trends, inference context memory, KV‑tiering
Build business cases including TAM/SAM/SOM models, pricing strategies, investment requirements, and ROI modeling
Guide lifecycle execution from concept → development → through cross‑functional program reviews (Stage Gate)
Partner closely with ASIC, firmware, NAND, system, and quality teams to drive technical readiness and performance targets
Manage supplier relationships and work with supply chain to ensure cost, yield, and delivery success

What we offer

Short-Term Incentive (STI) Plan
Long-Term Incentive (LTI) program (restricted stock units (RSUs) or cash equivalents)
RSU awards for eligible new hires
Paid vacation time
Paid sick leave
Medical/dental/vision insurance
Life, accident and disability insurance
Tax-advantaged flexible spending and health savings accounts
Employee assistance program
Other voluntary benefit programs such as supplemental life and AD&D, legal plan, pet insurance, critical illness, accident and hospital indemnity

Fulltime

Architecture Intern - Inference

Etched

Location:
United States , San Jose

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
February 18, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Architecture Intern - Inference

AI Inference Engineer

AI Inference Engineer

Engineering Manager - Inference

ML Engineer - Inference Serving

AI / ML Engineer, Software Engineering

Cloud Solution Architect - AI/ML

Lead Machine Learning Engineer

Lead Product Manager, AI Storage Solutions

Architecture Intern - Inference

Etched

Location:United States , San Jose

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:February 18, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Architecture Intern - Inference

AI Inference Engineer

AI Inference Engineer

Engineering Manager - Inference

ML Engineer - Inference Serving

AI / ML Engineer, Software Engineering

Cloud Solution Architect - AI/ML

Lead Machine Learning Engineer

Lead Product Manager, AI Storage Solutions

Location:
United States , San Jose

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
February 18, 2026