CrawlJobs Logo

Deployment Engineer, AI Inference

cerebras.net Logo

Cerebras Systems

Location Icon

Location:
United States; Canada , Sunnyvale

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

Job Responsibility:

  • Deploy AI inference replicas and cluster software across multiple datacenters
  • Operate across heterogeneous datacenter environments undergoing rapid 10x growth
  • Maximize capacity allocation and optimize replica placement using constraint-solver algorithms
  • Operate bare-metal inference infrastructure while supporting transition to K8S-based platform
  • Develop and extend telemetry, observability and alerting solutions to ensure deployment reliability at scale
  • Develop and extend a fully automated deployment pipeline to support fast software updates and capacity reallocation at scale
  • Translate technical and customer needs into actionable requirements for the Dev Infra, Cluster, Platform and Core teams
  • Stay up to date with the latest advancements in AI compute infrastructure and related technologies

Requirements:

  • 2-5 years of experience in operating on-prem compute infrastructure (ideally in Machine Learning or High-Performance Compute) or in developing and managing complex AWS plane infrastructure for hybrid deployments
  • Strong proficiency in Python for automation, orchestration, and deployment tooling
  • Solid understanding of Linux-based systems and command-line tools
  • Extensive knowledge of Docker containers and container orchestration platforms like K8S
  • Familiarity with spine-leaf (Clos) networking architecture
  • Proficiency with telemetry and observability stacks such as Prometheus, InfluxDB and Grafana
  • Strong ownership mindset and accountability for complex deployments
  • Ability to work effectively in a fast-paced environment
What we offer:
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs

Additional Information:

Job Posted:
February 17, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Deployment Engineer, AI Inference

Director of AI Engineering

We are entering a hyper-growth phase of AI innovation and are hiring a Director ...
Location
Location
Canada; United States
Salary
Salary:
300000.00 - 450000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10–15+ years in software engineering, with significant leadership experience owning AI/ML or applied LLM systems at scale
  • Proven history shipping LLM-powered features, agentic workflows, or AI assistants used by real customers in production
  • Deep understanding of LLM orchestration frameworks (LangChain, LlamaIndex), RAG pipelines, vector search, embeddings, and prompt engineering
  • Expert in backend & distributed systems (Python strongly preferred) and cloud infrastructure (AWS/GCP)
  • Strong experience with telemetry, observability, and cost-aware real-time inference optimizations
  • Demonstrated ability to lead senior engineers, define technical roadmaps, and deliver outcomes aligned to business metrics
  • Experience building or scaling teams working on experimentation, optimization, personalization, or ML-powered growth systems
  • Exceptional ability to simplify complex problems, set clear standards, and drive alignment across Product, Data, Design, and Engineering
  • Strong product sense, ability to weigh novelty vs. impact, focus on user value, and prioritize speed with guardrails
  • Fluent in integrating AI tools into engineering workflows for code generation, debugging, delivery velocity, and operational efficiency
Job Responsibility
Job Responsibility
  • Define the multi-year technical vision for Apollo’s AI stack, spanning agents, orchestration, inference, retrieval, and platformization
  • Prioritize high-impact AI investments by partnering with Product, Design, Research, and Data leaders to align engineering outcomes with business goals
  • Establish technical standards, evaluation criteria, and success metrics for every AI-powered feature shipped
  • Lead the architecture and deployment of long-horizon autonomous agents, multi-agent workflows, and API-driven orchestration frameworks
  • Build reusable, scalable agentic components that power GTM workflows like research, enrichment, sequencing, lead scoring, routing, and personalization
  • Own the evolution of Apollo’s internal LLM platform for high-scale, low-latency, cost-optimized inference
  • Oversee model-driven experiences for natural-language interfaces, RAG pipelines, semantic search, personalized recommendations, and email intelligence
  • Partner with Product & Design to build intuitive conversational UX that hides underlying complexity while elevating user productivity
  • Implement rigorous evaluation frameworks, including offline benchmarking, human-in-the-loop review, and online A/B experimentation
  • Ensure robust observability, monitoring, and safety guardrails for all AI systems in production
What we offer
What we offer
  • Equity
  • Company bonus or sales commissions/bonuses
  • 401(k) plan
  • At least 10 paid holidays per year
  • Flex PTO
  • Parental leave
  • Employee assistance program and wellbeing benefits
  • Global travel coverage
  • Life/AD&D/STD/LTD insurance
  • FSA/HSA
  • Fulltime
Read More
Arrow Right

Senior Software Engineer – AI

NStarX is seeking a highly skilled Senior Software Engineer – AI with a strong f...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
nstarxinc.com Logo
NStarX
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field (PhD is a plus)
  • 9+ years of experience in AI/ML engineering or related roles
  • 3+ years of experience in Generative AI with team leadership responsibilities
  • Proven track record of production-grade ML and GenAI model development and deployment
  • Programming: Python (preferred)
  • GenAI Frameworks: Hugging Face Transformers, Diffusers, LangChain, TGI
  • Serving & Inference: FastAPI, gRPC, NVIDIA Triton, TorchServe
  • Cloud Platforms: AWS (SageMaker, EKS), GCP (Vertex AI, GKE), Azure (Azure ML, AKS)
  • MLOps & DevOps: Kubeflow, MLflow, GitHub Actions, Jenkins, Helm, Terraform
  • Optimization Techniques: Model quantization, distillation, pipeline and tensor parallelism
Job Responsibility
Job Responsibility
  • Design, develop, and deploy machine learning models and AI algorithms to address complex business challenges
  • Lead and mentor a team of AI/ML engineers, ensuring quality and scalability in solution design and implementation
  • Collaborate closely with cross-functional teams including data scientists, software engineers, product managers, and UX designers
  • Lead the development and deployment of Generative AI applications across text, code, image, and audio modalities using state-of-the-art LLMs
  • Design and implement CI/CD pipelines for the GenAI model lifecycle including training, validation, packaging, and deployment
  • Apply best practices for model performance tuning, cost optimization, and scalable deployment in cloud and hybrid environments
  • Develop prompt engineering, fine-tuning strategies (LoRA, QLoRA, PEFT), and evaluation protocols tailored to business use cases
  • Stay current with emerging trends in AI, ML, and Generative AI and drive adoption across teams
  • Document processes, model architectures, and deployment strategies for traceability and knowledge sharing
  • Work closely with cross-functional teams to gather requirements and deliver high-quality solutions
What we offer
What we offer
  • Competitive salary aligned with market standards
  • Opportunities for professional development and skill enhancement
  • A collaborative and innovative work environment
  • Fulltime
Read More
Arrow Right

AI Software Engineer

Join Qargo as an AI Software Engineer and help build intelligent, user-centric A...
Location
Location
Belgium , Ghent
Salary
Salary:
Not provided
qargo.com Logo
Qargo
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Min. 2 years of experience in software engineering, applied AI, or similar technical roles
  • Strong programming skills (preferably Python and/or modern backend languages)
  • Experience with AI/ML tools and frameworks such as PyTorch, Hugging Face, LangChain/LangGraph, vector databases, and inference tooling
  • Proven experience deploying and operating AI/ML systems in a production environment
  • Ability to experiment quickly, iterate fast, and validate assumptions
  • Strong problem-solving skills and the ability to work autonomously in a fast-paced environment
  • Clear communication skills and the ability to collaborate with engineers, product managers, and domain experts
Job Responsibility
Job Responsibility
  • Evaluate and prototype with new AI models and techniques to solve document, workflow, and conversational tasks
  • Bring AI prototypes to production, ensuring quality, scalability, and observability
  • Monitor and maintain AI systems running in production, optimising cost, latency, and reliability
  • Collaborate with cross-functional teams to define clear AI tasks (e.g., document classification, summarisation, task prediction)
  • Develop and enhance AI-driven features such as document extraction, matching flows, quality checks, chatbots, and automated bookings
  • Stay up to date with advancements in AI and identify opportunities to improve the product
What we offer
What we offer
  • Real impact and ownership in a growing international scale-up
  • A supportive and collaborative team culture
  • Hybrid working setup with flexibility and trust
  • Opportunities to learn, grow, and expand your technical knowledge
  • Competitive salary and benefits package
Read More
Arrow Right

AI Software Engineer III

Planet DDS is a leading provider of a platform of cloud-based solutions that emp...
Location
Location
United Kingdom , Glasgow
Salary
Salary:
Not provided
planetdds.com Logo
Planet DDS
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5-7 years of professional software engineering experience
  • At least 4 years in AI/ML-focused roles
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Artificial Intelligence, or related field
  • Experience working in a SaaS or enterprise software environment
  • Publications or contributions to open-source AI/ML projects
  • Exposure to reinforcement learning, generative AI (LLMs, diffusion models), or real-time inference systems
Job Responsibility
Job Responsibility
  • Design, develop, and deploy AI and machine learning models in production environments
  • Architect scalable solutions that integrate AI capabilities into our products and services
  • Collaborate with data scientists, product managers, and backend/front-end engineers to translate prototypes into reliable, maintainable code
  • Own end-to-end development of AI systems, including data ingestion, model training, evaluation, and deployment
  • Implement best practices in model versioning, monitoring, and continuous improvement
  • Contribute to the evolution of our AI/ML infrastructure, including CI/CD pipelines and MLOps tools
  • Stay current on advancements in AI, ML, and deep learning and assess their applicability to business needs
  • Ensure AI solutions are ethical, interpretable, and aligned with regulatory requirements
  • Fulltime
Read More
Arrow Right

Senior Devops & AI Engineer

This role presents a unique opportunity to contribute to the future of impactful...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
fissionlabs.com Logo
Fission Labs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or related field
  • 6+ years of experience in Infrastructure Mgmt. roles, with a focus on cloud platforms (Azure and AWS Preferred)
  • Hands-on experience with operations (DevSecOps) principles and best practices
  • Proficiency in scripting languages such as Python, PowerShell, or Bash
  • Excellent communication and collaboration skills
  • In-depth knowledge of Linux operating systems, including CentOS, Ubuntu, and Red Hat, with expertise in shell scripting, package management, and system administration
  • Hands-on experience with a wide range of AWS and Azure services
  • Develop and maintain Infrastructure as Code (IAC) templates using tools such as Terraform or AWS CloudFormation
  • Experience setting up cloud infrastructure stack, databases, service endpoints, GPU as well as CPU resource scaling, optimization etc.
  • Should have worked AIOps/MLOP
Job Responsibility
Job Responsibility
  • Configure and optimize Linux-based servers for performance, security, and resource utilization, including kernel tuning, file system management, and network configuration
  • Architect cloud solutions leveraging best practices and services offered by AWS and Azure, optimizing for scalability, reliability, and cost-effectiveness
  • Implement and manage hybrid cloud environments, facilitating seamless integration and interoperability between AWS and Azure services
  • Establish version control practices for IAC templates, ensuring traceability, auditability, and reproducibility of infrastructure changes
What we offer
What we offer
  • Opportunity to work on impactful technical challenges with global reach
  • Vast opportunities for self-development, including online university access and knowledge sharing opportunities
  • Sponsored Tech Talks & Hackathons to foster innovation and learning
  • Generous benefits packages including health insurance, retirement benefits, flexible work hours, and more
  • Supportive work environment with forums to explore passions beyond work
  • Fulltime
Read More
Arrow Right

Principal AI Engineer

We are looking for a Principal AI Engineer to lead the design and deployment of ...
Location
Location
United States
Salary
Salary:
200000.00 - 300000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of software engineering experience
  • at least 3 years in applied LLM or agentic AI systems (2023–present)
  • proven success in deploying LLM-powered products used by real users at scale
  • deep backend & systems engineering expertise with Python, distributed systems, and scalable APIs
  • familiarity with LangChain, LlamaIndex, or similar orchestration frameworks
  • experience with RAG pipelines, vector DBs, embedding models, and semantic search tuning
  • experience managing performance across cloud providers (e.g., AWS Bedrock, OpenAI, Anthropic, etc.)
  • demonstrated experience building multi-step agents, planning workflows, chaining reasoning steps, and integrating APIs with agent memory/state
  • comfort with advanced prompting strategies, few-shot and chain-of-thought reasoning, and embedding retrieval setups
  • strong understanding of AI system evaluation, human ratings, A/B experimentation, and feedback loop pipelines
Job Responsibility
Job Responsibility
  • Architect and lead the development of multi-agent systems capable of long-horizon planning, reasoning, and API orchestration
  • build reusable agentic components that integrate deeply into sales and marketing processes
  • own and evolve our in-house platform for scalable, low-latency, and cost-efficient LLM and agent deployments
  • lead design of interfaces powered by natural language understanding and retrieval-augmented generation (RAG)
  • build embedding-based, intent-aware search and personalization systems tuned to business user needs
  • drive innovation in personalized outreach generation using context-aware generation pipelines
  • tune inference pipelines, caching layers, and model selection logic for high-scale, cost-aware performance
  • define and drive robust offline and online testing methodologies (A/B, sandboxing, human evals) across agents and LLM flows
  • architect human-in-the-loop systems and telemetry to improve accuracy, UX, and explainability over time
What we offer
What we offer
  • equity
  • company bonus or sales commissions/bonuses
  • 401(k) plan
  • at least 10 paid holidays per year
  • flex PTO
  • parental leave
  • employee assistance program
  • wellbeing benefits
  • global travel coverage
  • life/AD&D/STD/LTD insurance
  • Fulltime
Read More
Arrow Right

Research Engineer AI

The role involves conducting high-quality research in AI and HPC, shaping future...
Location
Location
United Kingdom , Bristol
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A good working knowledge of AI/ML frameworks, at least TensorFlow and PyTorch, as well as the data preparation, handling, and lineage control, as well as model deployment, in particular in a distributed environment
  • At least a B.Sc. equivalent in a Science, Technology, Engineering or Mathematical discipline
  • Development experience in compiled languages such as C, C++ or Fortran and experience with interpreted environments such as Python
  • Parallel programming experience, with relevant programming models such as OpenMP, MPI, CUDA, OpenACC, HIP, PGAS languages is highly desirable
Job Responsibility
Job Responsibility
  • Perform world-class research while also shaping products of the future
  • Enable high performance AI software stacks on supercomputers
  • Provide new environments/abstractions to support application developers to build, deploy, and run AI applications taking advantage of leading-edge hardware at scale
  • Manage modern data-intensive AI training and inference workloads
  • Port and optimize workloads of key research centers like the AI safety institute
  • Support onboarding and scaling of domain-specific applications
  • Foster collaboration with the UK and European research community
What we offer
What we offer
  • Health & Wellbeing benefits that support physical, financial and emotional wellbeing
  • Career development programs catered to achieving career goals
  • Unconditional inclusion in the workplace
  • Flexibility to manage work and personal needs
  • Fulltime
Read More
Arrow Right

Artificial (AI) Engineer

VELOX is hiring an AI Developer to help design and implement intelligent systems...
Location
Location
United States , Boise
Salary
Salary:
Not provided
veloxmedia.com Logo
VELOX Media
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong proficiency in Python (Pandas, NumPy, scikit-learn, etc.)
  • Experience with deep learning frameworks such as TensorFlow or PyTorch
  • Hands-on experience with natural language processing, retrieval-augmented generation (RAG), or LLMs (e.g., OpenAI, Claude, Mistral)
  • Understanding of data pipelines, model deployment, and performance monitoring
  • Experience working with APIs and integrating ML models into production systems
  • Familiarity with vector databases (e.g., Pinecone, Weaviate, FAISS) and embedding generation
  • Comfort working in cloud environments (GCP, AWS, or Azure)
  • Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field
  • 3+ years of experience in applied AI/ML roles
  • Track record of launching AI tools or systems into production
Job Responsibility
Job Responsibility
  • Research, design, and deploy AI/ML models that drive value across client-facing and internal applications
  • Build tools that support predictive analytics, natural language querying, and campaign automation
  • Collaborate with product and engineering teams to integrate AI functionality into web platforms
  • Integrate AI solutions with our PHP/Laravel backend and MySQL databases via REST APIs or microservices
  • Write clean, scalable code for inference pipelines, model training, and testing environments
  • Monitor model performance and retrain or refine when necessary
  • Stay ahead of LLMs, vector DBs, and open-source innovations to enhance our AI roadmap
  • Contribute to a long-term AI strategy that makes VELOX more automated, intelligent, and insightful
What we offer
What we offer
  • Competitive compensation and performance bonuses
  • Health insurance & 401k options
  • Paid vacation and holidays
  • Casual dress and regular team events
  • On-site gym and personal trainer access
  • Kombucha on tap
  • Fulltime
Read More
Arrow Right