CrawlJobs Logo

LLM Inference Performance & Evals Engineer

cerebras.net Logo

Cerebras Systems

Location Icon

Location:
Canada , Toronto

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Join the inference model team dedicated to bring up the state-of-the-art models, numerically validating and accelerating new model ideas on wafer-scale hardware. You will prototype architectural tweaks, build performance-eval pipelines, and turn hard numbers into changes that land in production.

Job Responsibility:

  • Prototype and benchmark cutting-edge ideas: new attentions, MoE, speculative decoding, and many more innovations as they emerge
  • Develop agent-driven automation that designs experiments, schedules runs, triages regressions, and drafts pull-requests
  • Work closely with compiler, runtime, and silicon teams: unique opportunity to experience the full stack of software/hardware innovation
  • Keep pace with the latest open- and closed-source models
  • run them first on wafer scale to expose new optimization opportunities

Requirements:

  • 3+ years building high-performance ML or systems software
  • Solid grounding in Transformer math—attention scaling, KV-cache, quantisation—or clear evidence you learn this material rapidly
  • Comfort navigating the full AI toolchain: Python modeling code, compiler IRs, performance profiling, etc.
  • Strong debugging skills across performance, numerical accuracy, and runtime integration
  • Prior experience in modeling, compilers or crafting benchmarks or performance studies
  • not just black-box QA tests
  • Strong passion to leverage AI agents or workflow orchestration tools to boost personal productivity

Nice to have:

  • Hands-on with flash-attention, Triton kernels, linear-attention, or sparsity research
  • Performance-tuning experience on custom silicon, GPUs, or FPGAs
  • Proficiency in C/C++ programming and experience with low-level optimization
  • Proven experience in compiler development, particularly with LLVM and/or MLIR
  • Publications, repos, or blog posts dissecting model speed-ups
  • Contributions to open-source agent frameworks
What we offer:
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs

Additional Information:

Job Posted:
February 17, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for LLM Inference Performance & Evals Engineer

Principal AI Engineer

We are looking for a Principal AI Engineer to lead the design and deployment of ...
Location
Location
United States
Salary
Salary:
200000.00 - 300000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of software engineering experience
  • at least 3 years in applied LLM or agentic AI systems (2023–present)
  • proven success in deploying LLM-powered products used by real users at scale
  • deep backend & systems engineering expertise with Python, distributed systems, and scalable APIs
  • familiarity with LangChain, LlamaIndex, or similar orchestration frameworks
  • experience with RAG pipelines, vector DBs, embedding models, and semantic search tuning
  • experience managing performance across cloud providers (e.g., AWS Bedrock, OpenAI, Anthropic, etc.)
  • demonstrated experience building multi-step agents, planning workflows, chaining reasoning steps, and integrating APIs with agent memory/state
  • comfort with advanced prompting strategies, few-shot and chain-of-thought reasoning, and embedding retrieval setups
  • strong understanding of AI system evaluation, human ratings, A/B experimentation, and feedback loop pipelines
Job Responsibility
Job Responsibility
  • Architect and lead the development of multi-agent systems capable of long-horizon planning, reasoning, and API orchestration
  • build reusable agentic components that integrate deeply into sales and marketing processes
  • own and evolve our in-house platform for scalable, low-latency, and cost-efficient LLM and agent deployments
  • lead design of interfaces powered by natural language understanding and retrieval-augmented generation (RAG)
  • build embedding-based, intent-aware search and personalization systems tuned to business user needs
  • drive innovation in personalized outreach generation using context-aware generation pipelines
  • tune inference pipelines, caching layers, and model selection logic for high-scale, cost-aware performance
  • define and drive robust offline and online testing methodologies (A/B, sandboxing, human evals) across agents and LLM flows
  • architect human-in-the-loop systems and telemetry to improve accuracy, UX, and explainability over time
What we offer
What we offer
  • equity
  • company bonus or sales commissions/bonuses
  • 401(k) plan
  • at least 10 paid holidays per year
  • flex PTO
  • parental leave
  • employee assistance program
  • wellbeing benefits
  • global travel coverage
  • life/AD&D/STD/LTD insurance
  • Fulltime
Read More
Arrow Right
New

Principal Engineer - Generative AI Infra Capabilities

Wells Fargo is seeking a Principal Engineer - Generative Gen AI GPU Infrastructu...
Location
Location
India , BENGALURU
Salary
Salary:
Not provided
https://www.wellsfargo.com/ Logo
Wells Fargo
Expiration Date
February 20, 2026
Flip Icon
Requirements
Requirements
  • 7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • Design GPU cluster topologies (H100/H200, NVLink/NVSwitch), networking, and storage paths for high‑throughput inferencing
  • document sizing and perf baselines.
  • Implement Run: AI constructs (Collections/Departments/Projects/workloads) for MDEV/MDEP/UCEP/MRM
  • codify quota, priority, and fair‑share policies.
  • POC & benchmark disaggregated inferencing (prefill/decode) with vLLM/TensorRT‑LLM
  • publish guidance for H100/H200 tuning (FP8/INT8/AWQ) and KV‑transfer behavior over NVLink.
  • Operationalize OpenShift AI parity for GPU scheduling, time slicing/MIG profiles, and preemption
  • validate upgrade paths and helm/kustomize packaging.
  • Integrate Triton Inference Server for multi‑model serving
Job Responsibility
Job Responsibility
  • Act as an advisor to leadership to develop or influence applications, network, information security, database, operating systems, or web technologies for highly complex business and technical needs across multiple groups
  • Lead the strategy and resolution of highly complex and unique challenges requiring in-depth evaluation across multiple areas or the enterprise, delivering solutions that are long-term, large-scale and require vision, creativity, innovation, advanced analytical and inductive thinking
  • Translate advanced technology experience, an in-depth knowledge of the organizations tactical and strategic business objectives, the enterprise technological environment, the organization structure, and strategic technological opportunities and requirements into technical engineering solutions
  • Provide vision, direction and expertise to leadership on implementing innovative and significant business solutions
  • Maintain knowledge of industry best practices and new technologies and recommends innovations that enhance operations or provide a competitive advantage to the organization
  • Strategically engage with all levels of professionals and managers across the enterprise and serve as an expert advisor to leadership
  • Fulltime
!
Read More
Arrow Right

AI Engineer

Our next frontier is a strategic shift: We're evolving beyond traditional analyt...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
mvfglobal.com Logo
MVF
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Python and service development: write clean, typed, production-ready code
  • comfortable with Pydantic, Asyncio, and FastAPI
  • treat prompts as code: versioned, tested, and decoupled from business logic
  • Cloud-native experience: hands-on experience deploying and operating containerised services on AWS (or GCP/Azure) using CI/CD platforms (Jenkins, GitHub Actions, CircleCI, BuildKite), cloud monitoring tools (Datadog, Sumologic, NewRelic), and container orchestrators (EKS, ECS)
  • comfortable with Terraform for infrastructure as code
  • Hands-on LLM experience: built something real with language models, whether production systems, serious side projects, or internal tools
  • understand that prompting is engineering, not magic
Job Responsibility
Job Responsibility
  • Architect & Engineer Agentic Systems: Build agents that act, not just answer
  • design agents that perform deterministic actions based on probabilistic reasoning
  • build systems that can reliably analyse data, execute function calls, and manage state across multi-step workflows without getting stuck in loops
  • Production-Grade RAG: go beyond basic vector search
  • implement hybrid search (keyword + semantic), re-ranking strategies, and metadata filtering
  • Structured Data Extraction: build pipelines that turn unstructured conversations into structured data that our downstream systems can use
  • Establish AI Engineering Foundations: Observability First: implement the "nervous system" of our AI
  • choose and set up tools (e.g., LangSmith, LangFuse, ADK, or custom) to trace execution chains
  • Evals as a Service: build the testing harness
  • create automated evaluation pipelines that test prompts against "Golden Datasets"
What we offer
What we offer
  • Summer Fridays
  • Competitive holiday benefits - 25 days a year paid holiday, plus 8 bank holidays (increases 1 day a year up to 30 days)
  • Hybrid working - 3 days a week in the office
  • Closed for Christmas holidays - Extra days not taken from your annual holiday allowance
  • Work from anywhere for 2 weeks a year
  • Life Assurance and Income Protection to protect your loved ones
  • Benefits allowance for health, dental, and vision coverage
  • Six months paid maternity leave, and one month paid paternity leave (subject to qualifying conditions) inclusive of same-sex and adoptive parents
  • Defined Contribution Pension and Salary Sacrifice Scheme
  • Be Well: Our award-winning wellbeing and mental health programme to support all MVFers and their families
  • Fulltime
Read More
Arrow Right

Senior AI Software Developer

The Senior AI Engineer owns end-to-end delivery of AI features—from design to pr...
Location
Location
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or master’s degree in computer science, engineering, data science, machine learning, artificial intelligence, or closely related quantitative discipline
  • Typically, 7-10 years’ experience
  • LLMs & Agents: Prompt engineering, function/tool calling, orchestration frameworks, RAG
  • ML/DS: Evaluation metrics (precision/recall, BLEU/ROUGE where relevant), error analysis
  • Data/RAG: Embeddings, similarity (cosine/IP), chunking, rerankers, vector DB operations
  • Backend: Python (FastAPI/Flask), microservices patterns
  • MLOps/Infra: Docker, Kubernetes, CI/CD, artifact management, GPU scheduling
  • Observability: Metrics/logging/tracing, dashboards, automated evaluation pipelines
  • Frameworks: PyTorch/TensorFlow, Hugging Face, LangChain/LlamaIndex
  • Data: Pandas, SQL/NoSQL, Parquet/Arrow, Kafka/queues
Job Responsibility
Job Responsibility
  • Translate high-level designs into clear component contracts, APIs, and service boundaries
  • Implement LLM integrations, RAG pipelines, agents, tool/function calling, and prompt strategies
  • Own feature delivery for sprints/releases
  • maintain high code quality and documentation
  • Fine-tune models when needed
  • design evaluation harnesses and metrics
  • Build A/B testing setups
  • track accuracy, latency, robustness, and task success rates
  • Conduct error analysis
  • iterate using feedback efficacy loops and prompt refinement
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
Read More
Arrow Right
New

Loss Prevention Supervisor - Security

Patrol all areas of the property; secure rooms; assist guests with room access. ...
Location
Location
United Arab Emirates , Dubai
Salary
Salary:
Not provided
https://www.marriott.com Logo
Marriott Bonvoy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • High school diploma or G.E.D. equivalent
  • At least 2 years of related work experience
  • At least 1 year of supervisory experience
Job Responsibility
Job Responsibility
  • Patrol all areas of the property
  • secure rooms
  • assist guests with room access
  • Conduct emergency response drills, daily physical hazard/safety inspections, investigations, interviews, and key control audit
  • Monitor Closed Circuit Televisions and alarm systems
  • Authorize, monitor, and document access to secured areas
  • Assist guests/employees during emergency situations
  • Respond to accidents, contact EMS or administer first aid/CPR as required
  • Gather information and complete reports
  • Maintain confidentiality of reports/documents, release information to authorized individuals
  • Fulltime
Read More
Arrow Right
New

Senior Solutions Engineer

As a Solution Engineer, you’ll serve as the technical lead during the sales proc...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
heidihealth.com Logo
Heidi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in Solution Engineering, preferably in SaaS, healthcare, or enterprise software
  • Excellent communication skills
  • Strong working knowledge of integration protocols (REST APIs, SAML/OIDC, SCIM), enterprise architecture, and security standards
  • Experience supporting sales cycles with large healthcare providers, health systems, or EMR vendors is highly valued (FHIR/HL7 familiarity a plus)
  • Ability to synthesize complexity and communicate clearly to both technical and non-technical audiences
  • Comfortable operating autonomously in a fast-paced, early-stage environment
  • A trusted partner to sales and a credible voice in the room with technical leaders on the customer side
Job Responsibility
Job Responsibility
  • Technical Discovery & Qualification: Lead deep technical discovery with enterprise and mid-market prospects to uncover integration, security, and compliance needs
  • Solution Design: Collaborate with product and engineering to design tailored solutions that meet customer requirements while aligning with Heidi’s platform roadmap
  • Product Demos & Technical Presentations: Deliver compelling demos and architecture walkthroughs to technical stakeholders including IT, InfoSec, and engineering teams
  • RFP & Security Review Support: Own technical responses for RFPs, security assessments, and due diligence requests with attention to detail and accuracy
  • Deal Acceleration & Objection Handling: Proactively surface and resolve technical concerns that slow down the sales process, acting as a trusted technical advisor to the customer
  • Post-Sales Handoff & Feedback Loop: Partner with implementation teams to ensure a smooth transition post-signature and provide feedback to product and engineering from the field
What we offer
What we offer
  • Flexible work with a hybrid environment
  • Additional paid day off for your birthday and wellness days
  • Discounted corporate gym memberships
  • A generous personal development budget of $500 per annum
  • Learn from some of the best engineers and creatives, joining a diverse team
  • Become an owner, with shares (equity) in the company, if Heidi wins, we all win
  • The rare chance to create a global impact as you immerse yourself in one of Australia’s leading healthtech startups
  • If you have an impact quickly, the opportunity to fast track your startup career
  • Fulltime
Read More
Arrow Right
New

Workplace Services Coordinator II

The Workplace Services Coordinator II helps create a welcoming and positive offi...
Location
Location
Poland , Warsaw
Salary
Salary:
Not provided
exactsciences.com Logo
Exact Sciences
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • High School Diploma or General Education Degree (GED)
  • 1 year of experience in an administrative, hospitality or role within an office environment
  • Proficient with office equipment (e.g., fax machines and printers)
  • Basic computer skills including Internet navigation, email usage, and word processing
  • Proficient in Microsoft Outlook, Excel macros and pivot tables, and Word mail merge
  • Demonstrated ability to perform the Essential Duties of the position with or without accommodation
  • Applicants must be currently authorized to work in country where work will be performed on a full or part-time basis. We are unable to sponsor or take over sponsorship of employment visas at this time
Job Responsibility
Job Responsibility
  • Establish offices as a pleasant and efficient work environment while ensuring smooth office operations
  • Welcome guests and address their needs promptly
  • Act as the internal and external primary point of contact as it relates to local office matters and supporting visitors
  • Maintain security and control access at the reception desk
  • Define processes to manage the day-to-day office life
  • Partner with HR and employees to help shape the local office culture in line with our corporate values
  • Assist with local events and celebrations in close collaboration with other team members
  • Assist new employees with their orientation to the organization by managing logistics and orchestrating the on-boarding process for newcomers. This will include training on processes and office systems
  • Support management team with requests relating to office space, office environment, and space allocation
  • Support general administration and provide guidance to administrative support for senior management/functions
  • Fulltime
Read More
Arrow Right
New

Customer Success Manager

Location
Location
India , Hyderabad
Salary
Salary:
Not provided
highspot.com Logo
Highspot
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • core customer success manager experience
  • experience in improving product adoption
  • experience in building success plans relating to customer business objectives
  • KPI includes upsell or cross-sell
  • Fulltime
Read More
Arrow Right