CrawlJobs Logo

Architecture Intern - Inference

etched.com Logo

Etched

Location Icon

Location:
United States , San Jose

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are seeking a talented Architecture intern to join our team and contribute to the design of next-generation AI accelerators. This role focuses on developing and optimizing compute architectures that deliver exceptional performance and efficiency for transformer workloads. You will work on cutting-edge architectural problems and performance modeling over the course of your internship.

Job Responsibility:

  • Support porting state-of-the-art models to our architecture
  • Help build programming abstractions and testing capabilities to rapidly iterate on model porting
  • Assist in building, enhancing, and scaling Sohu’s runtime, including multi-node inference, intra-node execution, state management, and robust error handling
  • Contribute to optimizing routing and communication layers using Sohu’s collectives
  • Utilize performance profiling and debugging tools to identify bottlenecks and correctness issues
  • Develop and leverage a deep understanding of Sohu to co-design both HW instructions and model architecture operations to maximize model performance
  • Implement high-performance software components for the Model Toolkit

Requirements:

  • Progress towards a Bachelor’s, Master’s, or PhD degree in computer science, computer engineering, or a related field
  • Proficiency in C++ or Rust
  • Understanding of performance-sensitive or complex distributed software systems, e.g. Linux internals, accelerator architectures (e.g. GPUs, TPUs), Compilers, or high-speed interconnects (e.g. NVLink, InfiniBand)
  • Familiarity with PyTorch or JAX
  • Ported applications to non-standard accelerator hardware or hardware platforms
  • Deep knowledge of transformer model architectures and/or inference serving stacks (vLLM, SGLang, etc.)

Nice to have:

  • Low-latency, high-performance applications using both kernel-level and user-space networking stacks
  • Deep understanding of distributed systems concepts, algorithms, and challenges, including consensus protocols, consistency models, and communication patterns
  • Solid grasp of Transformer architectures, particularly Mixture-of-Experts (MoE)
  • Built applications with extensive SIMD (Single Instruction, Multiple Data) optimizations for performance-critical paths
What we offer:
  • 12-week paid internship
  • Generous housing support for those relocating
  • Daily lunch and dinner in our office
  • Direct mentorship from industry leaders and world-class engineers
  • Opportunity to work on one of the most important problems of our time

Additional Information:

Job Posted:
February 18, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Architecture Intern - Inference

AI Inference Engineer

We are looking for an AI Inference engineer to join our growing team. Our curren...
Location
Location
United States , San Francisco; Palo Alto; New York City
Salary
Salary:
210000.00 - 385000.00 USD / Year
perplexity.ai Logo
Perplexity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
  • Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
  • Understanding of GPU architectures or experience with GPU kernel programming using CUDA
Job Responsibility
Job Responsibility
  • Develop APIs for AI inference that will be used by both internal and external customers
  • Benchmark and address bottlenecks throughout our inference stack
  • Improve the reliability and observability of our systems and respond to system outages
  • Explore novel research and implement LLM inference optimizations
What we offer
What we offer
  • equity
  • health
  • dental
  • vision
  • retirement
  • fitness
  • commuter and dependent care accounts
  • Fulltime
Read More
Arrow Right

AI Inference Engineer

We are looking for an AI Inference engineer to join our growing team. Our curren...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
perplexity.ai Logo
Perplexity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
  • Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
  • Understanding of GPU architectures or experience with GPU kernel programming using CUDA
Job Responsibility
Job Responsibility
  • Develop APIs for AI inference that will be used by both internal and external customers
  • Benchmark and address bottlenecks throughout our inference stack
  • Improve the reliability and observability of our systems and respond to system outages
  • Explore novel research and implement LLM inference optimizations
What we offer
What we offer
  • Equity may be part of the total compensation package
  • Fulltime
Read More
Arrow Right

Engineering Manager - Inference

We are looking for an Inference Engineering Manager to lead our AI Inference tea...
Location
Location
United States , San Francisco
Salary
Salary:
300000.00 - 385000.00 USD / Year
perplexity.ai Logo
Perplexity
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of engineering experience with 2+ years in a technical leadership or management role
  • Deep experience with ML systems and inference frameworks (PyTorch, TensorFlow, ONNX, TensorRT, vLLM)
  • Strong understanding of LLM architecture: Multi-Head Attention, Multi/Grouped-Query Attention, and common layers
  • Experience with inference optimizations: batching, quantization, kernel fusion, FlashAttention
  • Familiarity with GPU characteristics, roofline models, and performance analysis
  • Experience deploying reliable, distributed, real-time systems at scale
  • Track record of building and leading high-performing engineering teams
  • Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism
  • Strong technical communication and cross-functional collaboration skills
Job Responsibility
Job Responsibility
  • Lead and grow a high-performing team of AI inference engineers
  • Develop APIs for AI inference used by both internal and external customers
  • Architect and scale our inference infrastructure for reliability and efficiency
  • Benchmark and eliminate bottlenecks throughout our inference stack
  • Drive large sparse/MoE model inference at rack scale, including sharding strategies for massive models
  • Push the frontier with building inference systems to support sparse attention, disaggregated pre-fill/decoding serving, etc.
  • Improve the reliability and observability of our systems and lead incident response
  • Own technical decisions around batching, throughput, latency, and GPU utilization
  • Partner with ML research teams on model optimization and deployment
  • Recruit, mentor, and develop engineering talent
What we offer
What we offer
  • Equity
  • Health
  • Dental
  • Vision
  • Retirement
  • Fitness
  • Commuter and dependent care accounts
  • Fulltime
Read More
Arrow Right

ML Engineer - Inference Serving

Luma’s mission is to build multimodal AI to expand human imagination and capabil...
Location
Location
United States; United Kingdom , Palo Alto; London
Salary
Salary:
187500.00 - 395000.00 USD / Year
lumalabs.ai Logo
Luma AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong Python and system architecture skills
  • Experience with model deployment using PyTorch, Huggingface, vLLM, SGLang, tensorRT-LLM, or similar
  • Experience with queues, scheduling, traffic-control, fleet management at scale
  • Experience with Linux, Docker, and Kubernetes
  • Python
  • Redis
  • S3-compatible Storage
  • Model serving (one of: PyTorch, vLLM, SGLang, Huggingface)
  • Understanding of large-scale orchestration, deployment, scheduling (via Kubernetes or similar)
Job Responsibility
Job Responsibility
  • Ship new model architectures by integrating them into our inference engine
  • Collaborate closely across research, engineering and infrastructure to streamline and optimize model efficiency and deployments
  • Build internal tooling to measure, profile, and track the lifetime of inference jobs and workflows
  • Automate, test and maintain our inference services to ensure maximum uptime and reliability
  • Optimize deployment workflows to scale across thousands of machines
  • Manage and optimize our inference workloads across different clusters & hardware providers
  • Build sophisticated scheduling systems to optimally leverage our expensive GPU resources while meeting internal SLOs
  • Build and maintain CI/CD pipelines for processing/optimizing model checkpoints, platform components, and SDKs for internal teams to integrate into our products/internal tooling
  • Fulltime
Read More
Arrow Right

AI / ML Engineer, Software Engineering

iCapital is seeking an experienced and forward-thinking AI/ML Engineer Vice Pres...
Location
Location
United States , New York
Salary
Salary:
180000.00 - 220000.00 USD / Year
icapital.com Logo
iCapital Network
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in software engineering, with at least 2+ years focused on AI/ML systems
  • Proven experience in building and deploying ML models in production environments
  • Hands-on experience with AI agent frameworks (e.g., LangChain, Semantic Kernel, AutoGen, or custom-built systems)
  • Strong understanding of the ML lifecycle, including data pipelines, model training, evaluation, deployment, and monitoring
  • Familiar with MLOps tools such as MLflow, Kubeflow, or SageMaker
  • Deep understanding of LLM orchestration, prompt engineering, tool use, and memory architectures
  • Familiar with various LLM inference engines such as vLLM or SGLang
  • Experience in integrating agents with APIs, databases, and external systems
  • Familiar with retrieval-augmented generation (RAG), vector databases, and knowledge graphs
  • Experience deploying AI systems in cloud environments (AWS, GCP, Azure) and utilizing containerization tools (Docker, Kubernetes)
Job Responsibility
Job Responsibility
  • Design, build, and optimize scalable AI/ML infrastructure and services powering intelligent features across our platform
  • Lead the development of AI agents capable of autonomous decision-making, task execution, and multi-step reasoning across internal and customer-facing applications
  • Architect and implement modular agent frameworks by integrating tools, APIs, and memory systems for dynamic and context-aware behavior
  • Collaborate with product, data, and infrastructure teams to embed AI capabilities into production systems
  • Drive the architecture and development of ML pipelines, model serving frameworks, and real-time inference systems
  • Evaluate and integrate state-of-the-art AI tools and frameworks to accelerate development and deployment
  • Provide technical mentorship and guidance to engineers, contributing to team growth and best practices
  • Partner with Data Science teams to operationalize models, ensuring a smooth transition from experimentation to production
  • Contribute to technical roadmaps and help define long-term AI/ML platform and agent strategy
  • Optimize agent performance for latency, reliability, and safety in production environments
What we offer
What we offer
  • Equity for all full-time employees
  • Annual performance bonus
  • Employer matched retirement plan
  • Generously subsidized healthcare with 100% employer paid dental, vision, telemedicine, and virtual mental health counseling
  • Parental leave
  • Unlimited paid time off (PTO)
  • Fulltime
Read More
Arrow Right

Cloud Solution Architect - AI/ML

We are looking for a Cloud Solution Architect (CSA) who is passionate about driv...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science, Information Technology, Engineering, Business, or related field AND experience in cloud/infrastructure technologies, information technology (IT) consulting/support, systems administration, network operations, software development/support, technology solutions, practice development, architecture, and/or consulting
  • OR equivalent experience
  • This role requires UK Security Clearance, therefore candidates will need to either have existing security clearance or meet the minimum criteria to apply for security clearance.
  • Strong years of experience working in a customer-facing role (e.g., internal and/or external).
  • Strong years of experience working on technical projects
  • Technical Certification in Cloud (e.g., Azure, Amazon Web Services, Google, security certifications)
  • Experience and expertise in one or more of the following areas: Azure AI Foundry (Models, Agent Service, Semantic Kernel, Search, ML, SDK)
  • AppPlat/Containers/Serverless (App Service, AKS, ACA, ARO, Functions)
  • DevOps (CI/CD, Azure DevOps, DevSecOps)
  • GitHub (Copilot, Enterprise, Adv Security, Actions, Codespaces)
Job Responsibility
Job Responsibility
  • Understand customers’ Business and IT priorities and translate them into AI, ML, and cloud engineering architectures, spanning platform engineering, cloud‑native apps, inference pipelines, data workflows, and low‑code extensibility.
  • Act as a highly technical partner, leading customers through architecture reviews, proofs‑of‑concept, and MVP builds, including environment setup, model orchestration, retrieval pipelines, CI/CD automation, and deployment readiness.
  • Implement secure, performant solutions that meet production standards across performance, reliability, maintainability, observability, and Responsible AI requirements.
  • Deliver engineering‑focused workshops, deep‑dives, and readiness sessions
  • guide customers on ML engineering patterns, prompt engineering, data preparation, RAG design, deployment pipelines, and cloud development best practices.
  • Accelerate customer success by diagnosing and resolving technical blockers in application development, ML workflows, model deployment, and cloud infrastructure, driving adoption of Azure AI, Foundry, and cloud services.Use engineering knowledge to propose architecture improvements, performance optimisations, and scalable solution patterns.
  • Stay current with the latest Azure AI, OpenAI, Foundry, HuggingFace, GitHub, and cloud-native capabilities
  • be a practitioner in Python, .NET, JavaScript/Node, or equivalent enterprise stacks.Contribute reusable assets, patterns, sample architectures, code accelerators, and internal IP to scale technical impact across the CSA community.
  • Fulltime
Read More
Arrow Right

Lead Machine Learning Engineer

Machine Learning Engineers specializing in Inference Optimization focus on maxim...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
thoughtworks.com Logo
Thoughtworks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep practical expertise in model and runtime optimization techniques (quantization, pruning, distillation, batching, caching)
  • Proven experience optimizing inference workloads using frameworks such as vLLM, NVIDIA Triton/Dynamo
  • Strong proficiency in deep learning frameworks (e.g. PyTorch, TensorFlow) with production deployment experience
  • Ability to diagnose and optimize performance using profiling tools (e.g. Nsight, PyTorch/TensorFlow profilers)
  • Solid understanding of GPU and accelerator architectures, and experience tuning workloads for cost and performance efficiency
  • Experience designing and benchmarking scalable inference systems across heterogeneous environments (GPU clusters, serverless, edge)
  • Familiarity with observability stacks, telemetry, and cost instrumentation for AI workloads
  • Demonstrated ability to lead small-to-medium engineering teams or technical workstreams
  • Skilled at balancing hands-on delivery with architectural oversight and mentorship
  • Strong communication and stakeholder engagement skills and are able to connect low-level optimizations with business impact
Job Responsibility
Job Responsibility
  • Lead the design and implementation of advanced model optimization pipelines, including quantization, pruning, and distillation
  • Architect and tune inference runtimes and serving frameworks to achieve optimal performance across deployments
  • Guide teams in implementing high-throughput serving strategies (continuous batching, KV caching, speculative decoding, asynchronous scheduling)
  • Develop benchmarks and performance dashboards to measure and communicate system-level efficiency improvements (throughput, latency, GPU utilization, cost)
  • Evaluate trade-offs across accuracy, performance, and cost, and design architectures to meet target SLAs across varied hardware environments (cloud, on-prem, edge)
  • Collaborate with infrastructure, MLOps, and product teams to embed inference optimization into production workflows and platform designs
  • Provide technical leadership and mentorship to engineers, fostering a culture of experimentation, rigor, and continuous performance improvement
  • Contribute to the development of internal frameworks, reference architectures, and playbooks for scalable and cost-efficient inference
  • Engage with clients to translate optimization outcomes into business value and articulate the ROI of technical improvements
What we offer
What we offer
  • Learning & Development
  • Interactive tools
  • Numerous development programs
  • Teammates who want to help you grow
  • Empowering our employees in their career journeys
  • Fulltime
Read More
Arrow Right

Lead Product Manager, AI Storage Solutions

As a Lead Product Manager, AI Storage Solutions you will define the strategic vi...
Location
Location
United States , Milpitas
Salary
Salary:
194425.00 - 275414.00 USD / Year
sandisk.com Logo
Sandisk
Expiration Date
May 06, 2026
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Electrical Engineering, Computer Science, Engineering, or related field
  • Minimum of 8+ years of experience in product management
  • Minimum of 2 years experience in AI Architectures for datacenters or on-device
  • Strong desire to take ownership of the full product lifecycle
  • Proven track record of managing and launching successful emerging technology products
  • Deep understanding of flash memory technology and AI Storage Solutions, 3D stacking, TSV architectures. NAND-based high bandwidth architectures
  • Any knowledge of AI software stack will be a added advantage
Job Responsibility
Job Responsibility
  • Define the strategic vision, roadmap, and execution plan for next‑generation memory solutions that enable cutting‑edge AI, Machine Learning, and High‑Performance Computing ecosystems
  • Serve as the connective tissue across engineering, marketing, operations, and hyperscale customers—driving competitive differentiation and business growth for high‑performance memory products
  • Own the multi‑year product roadmap, aligning with technology inflection points such as AI Storage Solutions, and emerging AI Storage Solutions architectures
  • Translate market dynamics, customer signals, and competitive insights into product requirements (MRD/PRD)
  • Lead deep technical engagements with hyperscalers and chipmakers (e.g., NVIDIA, AMD), turning customer performance, latency, and capacity needs into engineering deliverables
  • Position product solutions for AI inference, KV‑cache, and memory‑centric architectures informed by industry trends, inference context memory, KV‑tiering
  • Build business cases including TAM/SAM/SOM models, pricing strategies, investment requirements, and ROI modeling
  • Guide lifecycle execution from concept → development → through cross‑functional program reviews (Stage Gate)
  • Partner closely with ASIC, firmware, NAND, system, and quality teams to drive technical readiness and performance targets
  • Manage supplier relationships and work with supply chain to ensure cost, yield, and delivery success
What we offer
What we offer
  • Short-Term Incentive (STI) Plan
  • Long-Term Incentive (LTI) program (restricted stock units (RSUs) or cash equivalents)
  • RSU awards for eligible new hires
  • Paid vacation time
  • Paid sick leave
  • Medical/dental/vision insurance
  • Life, accident and disability insurance
  • Tax-advantaged flexible spending and health savings accounts
  • Employee assistance program
  • Other voluntary benefit programs such as supplemental life and AD&D, legal plan, pet insurance, critical illness, accident and hospital indemnity
  • Fulltime
Read More
Arrow Right