CrawlJobs Logo

Audio Inference Engineer, Model Efficiency

cohere.com Logo

Cohere

Location Icon

Location:
United States; Canada , New York

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Our team is a fast-growing group of committed researchers and engineers. The mission of the team is to build reliable machine learning systems and optimize audio inference serving efficiency using innovative techniques. As an engineer on this team, you will work on advancing core audio model serving metrics, including latency, throughput, and quality by diving deep into our systems, identifying bottlenecks, and delivering creative solutions for audio processing and streaming workloads. You’ll collaborate closely with both the training and serving infrastructure teams to ensure seamless integration between model development and deployment, with a special focus on real-time and streaming audio inference.

Job Responsibility:

  • Work on advancing core audio model serving metrics, including latency, throughput, and quality by diving deep into our systems, identifying bottlenecks, and delivering creative solutions for audio processing and streaming workloads
  • Collaborate closely with both the training and serving infrastructure teams to ensure seamless integration between model development and deployment, with a special focus on real-time and streaming audio inference

Requirements:

  • Significant experience developing high-performance audio or machine learning inference systems
  • Proficiency with programming languages such as C++ and Python
  • Hands-on experience with deep learning models for audio, speech, or language applications
  • A bias for action and a strong results-oriented mindset

Nice to have:

  • GPU programming, low-level system optimization, model parallelization techniques over multiple GPUs
  • Have experience with duplex real-time streaming architectures
  • Internals of machine learning frameworks for audio (such as PyTorch, TensorFlow, or specialized audio libraries)
  • Have experience with inference framework like vLLM, SGLang, Tensort-LLM, or custom distributed inference systems
  • Sequence modeling (e.g., transformers for audio/speech) and end-to-end audio pipeline optimization
What we offer:
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)

Additional Information:

Job Posted:
February 20, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Audio Inference Engineer, Model Efficiency

Senior Inference ML Runtime Engineer

The Inference ML Engineering team at Cerebras Systems is dedicated to enabling o...
Location
Location
United States; Canada , Sunnyvale; Toronto
Salary
Salary:
Not provided
cerebras.net Logo
Cerebras Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, Mathematics, or a related field
  • 8+ years of experience in large-scale software engineering, with a focus on deep learning or related domains
  • Proficiency in Python for building and maintaining scalable systems
  • Advanced proficiency in C++, with an emphasis on multi-threaded programming, performance optimization, and system-level development
  • Demonstrated experience driving cross-functional projects
  • Experience building and scaling large-scale inference systems for LLMs or multimodal models
  • Familiarity with LLM serving frameworks, such as vLLM, SGLang, and TensorRT-LLM
  • Solid understanding of software architectural patterns for large-scale, high-performance applications
  • Hands-on experience with ML frameworks, such as PyTorch, and a strong understanding of their underlying architectures
  • Strong problem-solving skills, with the ability to balance technical depth with practical implementation constraints
Job Responsibility
Job Responsibility
  • Drive and provide technical guidance to a team of software engineers working on complex machine learning integration projects
  • Design and implement ML features (e.g., structured outputs, biased sampling, predicted outputs) that improve performance of generative AI models at inference time
  • Design and implement high-throughput, low-latency multimodal inference models that support delivery of image, audio, and video inputs and outputs
  • Maintain our scalable serving backend for handling many concurrent requests per minute
  • Scale our inference service by implementing detailed observability throughout the entire stack
  • Analyze and improve latency, throughput, memory usage, and compute efficiency on the service and the implementation of various features
  • Optimize software to accelerate generative LLM inference by achieving high throughput and low latency
  • Stay up-to-date with advancements in machine learning and deep learning, and apply state-of-the-art techniques to enhance our solutions
  • Evaluate trade-offs between different approaches, clearly articulate design choices, and develop detailed proposals for implementing new features
  • Uncover, scope, and prioritize significant areas of technical debt across the software stack to ensure continued high quality of the inference service
What we offer
What we offer
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs
Read More
Arrow Right

Research Scientist Intern, Real-Time Multimodal AI

Reality Labs is building the future of connection through world-class AR/VR hard...
Location
Location
United States , Burlingame
Salary
Salary:
7650.00 - 12134.00 USD / Month
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or is in the process of obtaining, a PhD degree in Computer Science, Machine Learning, Electrical Engineering, or a related field
  • 2+ years of research experience in one or more of the following areas: multimodal learning, vision-language models, large language models, or foundation model fine-tuning
  • Hands-on experience fine-tuning large foundation models (e.g., LLaVA, InternVL, Qwen-VL, LLaMA, or similar)
  • Strong programming skills in Python
  • Experience with deep learning frameworks such as PyTorch
  • Excellent communication skills and ability to work independently
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
Job Responsibility
Job Responsibility
  • Research and develop novel approaches for fine-tuning large multimodal foundation models (vision-language, audio-visual) for real-time applications
  • Design and implement efficient inference pipelines for deploying fine-tuned models in real-time communication scenarios
  • Explore agentic architectures that leverage fine-tuned models as tools within larger AI systems
  • Collaborate with cross-functional teams to integrate models into prototype experiences
  • Document and present research progress with the goal of publishing findings at top-tier ML/CV conferences
  • Contribute to building working prototypes that demonstrate the capabilities of fine-tuned multimodal models
Read More
Arrow Right

Research Engineer, RealTime AI, MSL PAR

We are seeking research engineers to join the Product and Applied Research (PAR)...
Location
Location
United States , Bellevue, WA
Salary
Salary:
257000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 2+ years of industry experience in LLM/NLP, audio, or related AI/ML models
  • Experience as a formal technical lead, leading major technical initiatives with cross functional partners to impact, and/or influencing strategy across multiple teams
  • Skilled in model training, data, or inference & efficiency for LLMs
  • Experience building products/systems based on machine learning, reinforcement learning and/or deep learning methods
  • Programming experience in Python and hands-on experience with frameworks like PyTorch
Job Responsibility
Job Responsibility
  • Collaborate with cross-functional teams to develop Meta’s AI Characters products
  • Lead the development of new algorithms and systems for LLM post-training, evaluation and efficiency
  • Support creative data sourcing, high-quality post-training data curation, and scale and optimize data pipelines for large language models (LLMs)
  • Develop and integrate models,orchestrations and RAGs in production
  • Analyze and interpret experimental results, iterate on model architectures, and drive continuous improvement
  • Lead complex technical projects end-to-end
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

Senior Data Scientist

We are seeking a Senior Data Scientist with deep expertise in unstructured data ...
Location
Location
Taiwan
Salary
Salary:
Not provided
beyond.ai Logo
Beyond Limits
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on experience in AI, Machine Learning, and Data Science, with a strong focus on production-scale AI
  • Expertise in LLMs, including fine-tuning, distributed training, quantization, and pruning techniques
  • Experience working with OCR, ASR, and TTS applications in real-world deployments
  • Proven experience deploying AI models in production, with real-world examples of scaled AI applications
  • Strong understanding of cloud computing, containerization (Docker, Kubernetes), and ML Ops best practices
  • Proficiency in Python, PyTorch, and ML libraries
  • Hands-on experience with vector databases and retrieval-augmented generation (RAG) architectures
  • Strong awareness of AI system performance benchmarks (latency, speed, throughput) and ability to optimize models accordingly
  • Experience working with AI agents, designing real-world intelligent automation solutions beyond just open-source experimentation
  • Proficiency in transformer-based architectures (BERT, GPT, LLaMA, Whisper, etc.), including pre-training, fine-tuning, and task-specific adaptation
Job Responsibility
Job Responsibility
  • Develop and deploy AI models for unstructured data (text, speech, audio, images) with a focus on enterprise-scale performance
  • Fine-tune, optimize, and deploy LLMs and multimodal models, integrating distributed training, quantization, and pruning techniques for efficiency
  • Design and implement production-ready AI solutions, ensuring scalability, low-latency inference, and high throughput
  • Work with AI agents and automation frameworks to create intelligent, real-world AI applications for enterprise clients
  • Build and maintain end-to-end LLM Ops pipelines, ensuring efficient training, deployment, monitoring, and model updates
  • Implement vector search and retrieval-augmented generation (RAG) systems for large-scale data solutions
  • Monitor AI performance using key metrics such as speed, latency, and throughput, continuously refining models for real-world efficiency
  • Work with cloud-based AI infrastructure (AWS, GCP) and containerized environments (Docker, Kubernetes) to scale AI solutions
  • Collaborate with engineering, DevOps, and product teams to align AI solutions with business needs and client requirements
  • Implement data curation pipelines, including data collection, cleaning, deduplication, decontamination, etc. for training high-quality AI models
Read More
Arrow Right

Senior Data Scientist

We are seeking a Senior Data Scientist with deep expertise in unstructured data ...
Location
Location
Salary
Salary:
Not provided
beyond.ai Logo
Beyond Limits
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on experience in AI, Machine Learning, and Data Science, with a strong focus on production-scale AI
  • Expertise in LLMs, including fine-tuning, distributed training, quantization, and pruning techniques
  • Experience working with OCR, ASR, and TTS applications in real-world deployments
  • Proven experience deploying AI models in production, with real-world examples of scaled AI applications
  • Strong understanding of cloud computing, containerization (Docker, Kubernetes), and ML Ops best practices
  • Proficiency in Python, PyTorch, and ML libraries
  • Hands-on experience with vector databases and retrieval-augmented generation (RAG) architectures
  • Strong awareness of AI system performance benchmarks (latency, speed, throughput) and ability to optimize models accordingly
  • Experience working with AI agents, designing real-world intelligent automation solutions beyond just open-source experimentation
  • Proficiency in transformer-based architectures (BERT, GPT, LLaMA, Whisper, etc.), including pre-training, fine-tuning, and task-specific adaptation
Job Responsibility
Job Responsibility
  • Develop and deploy AI models for unstructured data (text, speech, audio, images) with a focus on enterprise-scale performance
  • Fine-tune, optimize, and deploy LLMs and multimodal models, integrating distributed training, quantization, and pruning techniques for efficiency
  • Design and implement production-ready AI solutions, ensuring scalability, low-latency inference, and high throughput
  • Work with AI agents and automation frameworks to create intelligent, real-world AI applications for enterprise clients
  • Build and maintain end-to-end LLM Ops pipelines, ensuring efficient training, deployment, monitoring, and model updates
  • Implement vector search and retrieval-augmented generation (RAG) systems for large-scale data solutions
  • Monitor AI performance using key metrics such as speed, latency, and throughput, continuously refining models for real-world efficiency
  • Work with cloud-based AI infrastructure (AWS, GCP) and containerized environments (Docker, Kubernetes) to scale AI solutions
  • Collaborate with engineering, DevOps, and product teams to align AI solutions with business needs and client requirements
  • Implement data curation pipelines, including data collection, cleaning, deduplication, decontamination, etc. for training high-quality AI models
Read More
Arrow Right
New

Planning and Assessment Officer

We are currently seeking a Planning and Assessment Officer to join a prominent W...
Location
Location
Australia , Perth
Salary
Salary:
Not provided
https://www.randstad.com Logo
Randstad
Expiration Date
March 12, 2026
Flip Icon
Requirements
Requirements
  • Strong research and problem solving skills
  • Proven ability to build and enhance relationships with internal and external stakeholders
  • A solid understanding of traffic and transport planning policies and the systems used to manage them
Job Responsibility
Job Responsibility
  • Evaluating planning applications and referrals against established departmental policies
  • Acting as a point of contact for general planning enquiries from the public and government bodies
  • Utilizing a range of internal systems and databases to conduct thorough research and provide evidence-based responses
  • Analyzing complex queries and providing solutions that balance policy requirements with practical outcomes
What we offer
What we offer
  • Free access to Employee Assistance Program
Read More
Arrow Right
New

SEN Teaching Assistant

Are you passionate about making a difference in the lives of children with autis...
Location
Location
United Kingdom , Epsom
Salary
Salary:
20022.00 - 23833.00 GBP / Year
https://www.randstad.com Logo
Randstad
Expiration Date
March 06, 2026
Flip Icon
Requirements
Requirements
  • GCSEs in Maths and English are required
  • Experience working in SEND settings is desirable but not essential
  • Excellent communication and interpersonal skills, patience, empathy, and a flexible approach to education
  • The ability to work both independently and as part of a team is essential
  • Ability to work with learners that have various learning difficulties
  • A willingness to learn and build relationships with learners
  • Flexibility in their approach to education
Job Responsibility
Job Responsibility
  • Providing teaching assistance to the teacher defined according to weekly/daily/seasonal planning of the teacher
  • Working 1:1 with students
  • Providing group and/or individual activities, planned by the teacher
  • Working individually with learners/students to develop work
  • Supporting the general well-being of learners within the structure of the school
  • Monitoring and evaluating students' learning under the guidance of the teacher
  • Assisting teachers in the use of relevant management strategies to ensure a purposeful environment for teaching and learning to take place
  • Supporting all learners in their planned structured work in all curriculum areas
What we offer
What we offer
  • Fully funded training and qualifications
  • On-site parking
  • Extensive grounds
  • Free hot lunch from the on-site cafeteria
  • Team days and events
  • Supportive management
  • Brilliant career progression opportunities
  • A unique teaching approach
  • Brilliant salary and benefits package
  • Cycle to work and tax-free childcare vouchers
  • Fulltime
Read More
Arrow Right
New

SEN Teaching Assistant

Are you passionate about making a difference in the lives of children with autis...
Location
Location
United Kingdom , Leatherhead
Salary
Salary:
20022.00 - 23833.00 GBP / Year
https://www.randstad.com Logo
Randstad
Expiration Date
March 06, 2026
Flip Icon
Requirements
Requirements
  • GCSEs in Maths and English are required
  • Experience working in SEND settings is desirable but not essential
  • Excellent communication and interpersonal skills, patience, empathy, and a flexible approach to education
  • The ability to work both independently and as part of a team is essential
  • Ability to work with learners that have various learning difficulties
  • A willingness to learn and build relationships with learners
  • Flexibility in their approach to education
Job Responsibility
Job Responsibility
  • Providing teaching assistance to the teacher defined according to weekly/daily/seasonal planning of the teacher
  • Working 1:1 with students
  • Providing group and/or individual activities, planned by the teacher
  • Working individually with learners/students to develop work
  • Supporting the general well-being of learners within the structure of the school
  • Monitoring and evaluating students' learning under the guidance of the teacher
  • Assisting teachers in the use of relevant management strategies to ensure a purposeful environment for teaching and learning to take place
  • Supporting all learners in their planned structured work in all curriculum areas
What we offer
What we offer
  • Fully funded training and qualifications
  • On-site parking
  • Extensive grounds
  • Free hot lunch from the on-site cafeteria
  • Team days and events
  • Supportive management
  • Brilliant career progression opportunities
  • A unique teaching approach
  • Brilliant salary and benefits package
  • Cycle to work and tax-free childcare vouchers
  • Fulltime
Read More
Arrow Right