CrawlJobs Logo

Multimodal Speech Engineer

1x.tech Logo

1X Technologies

Location Icon

Location:
United States , Palo Alto

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

150000.00 - 250000.00 USD / Year

Job Description:

The AI Companion team creates the speech interface for NEO, as well as the physical awareness behaviors that evokes trust, warmth, and competence when NEO interacts with people. As a Multimodal Speech Engineer on the AI Companion Team, you will lead the effort to create a conversational speech model, from design to data collection to deployment. You will develop real-time architectures that enable NEO to not only converse with users, but also incorporate other modalities like vision, spatial audio, and body language. You will work closely with the design team to reflect NEO’s personality and 1X’s brand values in the way NEO speaks and responds to users, and the autonomy team to ensure that NEO’s speech models are aware of its own physical capabilities.

Job Responsibility:

  • Design and implement data pipelines for large scale speech interactions from NEO data and external datasets
  • Train speech2speech models to be aware of NEO’s embodiment
  • Design appropriate responses for a variety of user queries
  • Synchronize speech with body language
  • Customize NEO with different personalities

Requirements:

  • 3+ years of experience in speech and audio modeling domains
  • Experience in multi-modal conversational models (language, audio, vision) is a strong plus
  • Ability to take open-ended problems in conversation models, come up with creative solutions, implement proof-of-concepts, and translate those to production.

Nice to have:

Experience in multi-modal conversational models (language, audio, vision)

What we offer:
  • Health, dental, and vision insurance
  • 401(k) with company match
  • Paid time off and holidays

Additional Information:

Job Posted:
December 01, 2025

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Multimodal Speech Engineer

Multimodal Speech Engineer, AI Companion

As a Multimodal Speech Engineer on the AI Companion Team, you will lead the deve...
Location
Location
United States , Palo Alto
Salary
Salary:
150000.00 - 250000.00 USD / Year
1x.tech Logo
1X Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in speech and audio modeling domains
  • Experience with multi-modal conversational models (language, audio, vision)
  • Ability to take open-ended problems in conversation modeling, develop creative solutions, build proof-of-concepts, and scale them to production
Job Responsibility
Job Responsibility
  • Design and implement data pipelines for large-scale speech interactions using internal and external datasets
  • Train speech-to-speech models that incorporate awareness of NEO’s physical form
  • Create dynamic responses for a wide range of user queries
  • Synchronize NEO’s speech with physical gestures and body language
  • Customize NEO’s speech behavior to reflect different personalities
What we offer
What we offer
  • Equity
  • Health, dental, and vision insurance
  • 401(k) with company match
  • Paid time off and holidays
  • Fulltime
Read More
Arrow Right
New

AI Engineer - Speech & NLP

Interhuman AI is building the next generation of social intelligence infrastruct...
Location
Location
Denmark , København
Salary
Salary:
55000.00 - 65000.00 DKK / Year
life-science-talent-solutions.dk Logo
Life Science Talent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD in Machine Learning, Computer Science, or a related field with a focus on speech processing and/or NLP
  • Track record of building and shipping models
  • Strong proficiency in Python and deep experience with PyTorch (or JAX/TensorFlow)
  • Familiarity with the current landscape of speech and multimodal models (e.g., Whisper, audio-LLMs, speech encoders, vision-language models)
  • You thrive with ambiguity. You can scope your own work, prioritize ruthlessly, and know when to ask for input
  • Clear communicator—you can explain a complex architecture to both engineers and non-technical stakeholders
Job Responsibility
Job Responsibility
  • Design, train, and iterate on speech and language models that extract social and emotional signals from live conversation
  • Own the full model development lifecycle—from data curation and architecture design through training, evaluation, and production deployment
  • Build evaluation frameworks and benchmarks that capture the subtleties of human interaction that standard metrics miss
  • Stay at the frontier of multimodal research and translate relevant advances into our production stack
  • Collaborate closely with engineering to ensure models meet real-time latency and scalability requirements
What we offer
What we offer
  • Competitive salary and meaningful equity in an early-stage, venture-backed company
  • Direct influence on technical direction—your work shapes the product, not just a feature
  • A small, focused team where your contributions are visible and impactful from day one
  • Flexibility on location and working arrangements
  • Fulltime
Read More
Arrow Right

Research Intern - GenAI

Appen is seeking Research Interns to support innovative research in Generative A...
Location
Location
Australia , Chatswood, Sydney
Salary
Salary:
Not provided
appen.com Logo
Appen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Postgraduate students in Linguistics, Computer Science, AI, Data Science, or similar disciplines preferred
  • strong final-year and recent undergraduate candidates in these fields will also be considered
  • Familiarity with programming languages such as Python, R, or similar tools used in data analysis and machine learning
  • Experience with data annotation, model evaluation, or prompt engineering
  • Understanding of multilingual NLP, speech technologies, or agentic AI systems
  • Strong written communication skills, especially for summarizing research and drafting technical content
  • Ability to work independently and collaboratively in a remote research environment
Job Responsibility
Job Responsibility
  • Conduct literature reviews on topics such as adversarial prompting, multilingual evaluation, and agentic AI
  • Assist in dataset curation, annotation, and quality assurance for speech, text, and multimodal data
  • Support model evaluation experiments, including prompt engineering and red teaming
  • Develop scripts and tools for data analysis, visualization, and automation
  • Contribute to internal documentation, research reports, and thought leadership content
  • Participate in team meetings and cross-functional collaborations
  • Help prepare materials for conferences, publications, and workshops
What we offer
What we offer
  • Hands-on experience in applied AI research with real-world impact
  • Mentorship from experienced researchers and exposure to industry workflows
  • Opportunities to contribute to publications, datasets, and thought leadership
  • A collaborative and inclusive research environment
Read More
Arrow Right

Director, Digital Ecosystem Applications

This position is responsible for the Software Platforms group at the Innovation ...
Location
Location
United States , Belmont
Salary
Salary:
240000.00 - 285000.00 USD / Year
https://www.volkswagen-group.com Logo
Volkswagen AG
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years with 2+ years in a technical leadership role
  • CS, EE, M.S. Engineering (or equivalent) REQUIRED
  • M.S. Engineering (or equivalent) or PhD PREFERRED
  • Analytical and conceptual thinking – using logic and reason, creative and strategic
  • Communication skills – interpersonal, presentation and written
  • Managing interdisciplinary teams on individual projects
  • Integration – joining people, processes or systems
  • Influencing and negotiation skills
  • Problem solving
  • Resource management
Job Responsibility
Job Responsibility
  • Define the technical mission, architecture strategy, and long‑term platform vision for the In‑Vehicle Computing & Digital Ecosystem Applications team, spanning Android Automotive OS (AAOS), in‑vehicle compute platforms, Software‑Defined Vehicle (SDV) architecture, and AI‑driven cockpit intelligence
  • Provide technical leadership across the full software stack, including Android Framework, System Services, HAL layers, middleware, connectivity stacks, media/audio frameworks, HMI toolchains, and cloud‑connected AI runtimes within an SDV‑aligned architecture
  • Lead and mentor engineering teams in platform bring‑up, system integration, performance optimization, and development of AI‑agentic features, multimodal interaction models, and next‑generation speech technologies
  • Manage multi‑year budgets for platform development, AI integration, SDV‑aligned compute evolution, SoC evaluations, cloud services, and prototype programs
  • Deliver executive‑level technical reporting on architecture decisions, platform readiness, SDV integration milestones, AI progress, risks, and strategic recommendations
  • Drive strategic planning for ICC’s infotainment and cockpit portfolio, including AAOS evolution, hybrid cloud/edge AI pipelines, intelligent mobile agent technologies, and SDV‑centric software and compute roadmaps
  • Align technical roadmaps with global VW Group Innovation teams across infotainment, connectivity, AI/ML, vehicle architecture, cloud services, and SDV platform strategy, ensuring cross‑platform consistency and shared component reuse
  • Build strategic relationships with SoC vendors, Tier‑1 suppliers, cloud providers, and AI technology partners to influence cockpit compute and SDV platform evolution
  • Maintain partnerships with Silicon Valley companies specializing in AI runtimes, LLMs, speech, multimodal interaction, and automotive‑grade SDV‑compatible software frameworks
  • Collaborate with academic and research institutions on AI‑agentic systems, embedded ML, HMI, and in‑vehicle compute architectures aligned with SDV principles
What we offer
What we offer
  • Eligibility for annual performance bonus
  • Healthcare benefits
  • 401(k), with company match
  • Defined contribution retirement program
  • Tuition reimbursement
  • Company lease car program
  • Paid time off
  • Fulltime
Read More
Arrow Right
New

Senior Software Engineer - Product

Frontier Foundry (F²) is IDC’s bold new innovation engine — a design-led, full-s...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 7+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Hands-on experience with Nuance voice technologies or similar platforms (e.g., Azure Speech, Dialogflow, Alexa Skills Kit)
  • Deep understanding of Voice Access systems, accessibility APIs, and assistive technologies
  • Strong proficiency in full-stack development, especially client-side application engineering and user-facing experiences
  • Experience with GitHub Copilot, Copilot Studio, AI Foundry, or equivalent vibe coding/generative AI tools
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Design, build, and deliver high-quality software components aligned to F² charters: interaction models (inking, stylus, display tech), multimodal innovation (sensor fusion, voice/touch interfaces), or AI agents (context-aware, task-oriented)
  • Integrate and optimize Nuance Conversational AI technologies (e.g., speech-to-text, text-to-speech, NLU) into multimodal experiences
  • Enhance Voice Access capabilities across platforms, ensuring accessibility, responsiveness, and seamless user interaction
  • Work across the stack — from UI to backend — with a bias for impact and iteration
  • Embrace “vibe coding” using AI-assisted tools like GitHub Copilot, Copilot Studio, AI Foundry, and other generative AI tools to reduce boilerplate and drive intelligent test automation
  • Collaborate with product, design, and partner teams to shape backlog priorities and deliver intuitive, high-impact experiences
  • Navigate evolving priorities with ingenuity, turning loosely defined ideas into tangible software outcomes
  • Contribute to architecture discussions, code reviews, and prototyping efforts
  • Foster a culture of agility, experimentation, and outcome-driven development
  • Fulltime
Read More
Arrow Right

Senior Research Scientist

PolyAI automates customer service through lifelike voice assistants that let cus...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
poly.ai Logo
PolyAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD in Machine Learning, Natural Language Processing, Computer Science, or a related field
  • 5+ years of hands-on experience in deep learning
  • Proven track record of research innovation, including published work or deployed systems
  • Strong programming skills in Python and deep learning frameworks like PyTorch
  • Demonstrated expertise in at least one domain area such as reinforcement learning, conversational AI, audio modelling, or LLM alignment
  • Experience leading projects end-to-end, from ideation to deployment
  • Excellent communication skills with the ability to write clear technical documents and explain complex concepts to diverse audiences
  • Comfortable working in ambiguity and driving clarity through experimentation and data
Job Responsibility
Job Responsibility
  • Lead and execute complex research projects with clear business impact
  • Design and implement novel post-training strategies including preference tuning, reward modeling, and synthetic supervision
  • Develop innovative model architectures and training approaches for conversational AI, including speech-aware and multimodal models
  • Conduct empirical studies to assess model performance in live deployments and iterate quickly based on real-world data
  • Generate, collect, and annotate training data - including synthetic and real-world conversational datasets - with an eye for quality and bias mitigation
  • Design robust evaluation metrics and benchmarks for LLM-based assistants in customer service domains
  • Work closely with engineering and product teams to integrate research into production environments
  • Collaborate with legal and compliance teams to ensure responsible use of data and models
  • Stay current with academic and industry advances in LLMs, ASR, TTS, RLHF, and multimodal learning
What we offer
What we offer
  • Participation in the company’s employee share options plan
  • Tenure-Based PTO: You will receive 25 holidays when you join and will gain an additional 1 day after 2 years of service, then 1 day each year until capped at 32 holidays
  • Flexible working from home policy
  • Work from outside of the UK for up to 6 months each year
  • TELUS Health EAP 24/7 - offers you and your chosen family confidential, judgment-free support for any work, health, or life challenge
  • Enhanced parental leave
  • Bike2Work scheme
  • Annual learning and development allowance
  • We’re all about making WFH work for you - that’s why we offer a one-off WFH allowance when you join. Offering perks like noise-cancelling headphones or a comfortable desk chair to boost your comfort and focus!
  • Company-funded fertility and family-forming programmes
Read More
Arrow Right

Principal Software Engineer, CoreAI

Join Microsoft’s AI Core team building high performance runtime systems that ser...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience in systems programming with strong expertise in C++
  • Proven experience building, deploying, and operating scalable cloud services
  • Strong debugging skills and experience using performance profiling and diagnostic tools
  • Hands-on experience with distributed systems, Kubernetes, and containerized workloads
  • Experience with largescale LLM inferencing infrastructure, including CUDA
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Job Responsibility
Job Responsibility
  • Design and implement high performance microservices and runtime components in C++
  • Optimize AI inferencing systems for latency, throughput, cost, and reliability at large scale
  • Debug and resolve complex production issues related to performance, scaling, and service reliability
  • Collaborate with cross-functional partners to integrate model inference pipelines into scalable infrastructure
  • Contribute to state-of-the-art multimodal inferencing systems supporting text, speech, and vision workloads
  • Drive systems level innovations for realtime and batch inferencing efficiency
  • Participate in code reviews and provide technical mentorship to senior and peer engineers
  • Fulltime
Read More
Arrow Right

Full-Stack Engineer, AI Companion

The AI Companion team creates the speech interface to NEO, as well as the physic...
Location
Location
United States , Palo Alto
Salary
Salary:
150000.00 - 250000.00 USD / Year
1x.tech Logo
1X Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience with C++
  • 4+ years of experience with Python
  • 4+ years of experience with Bazel
  • 4+ years of experience with PyTorch
  • Experience with real‑time or streaming model architectures or systems
  • Product obsession with quality, performance, and design taste
  • Ability to take research ideas into production systems that work reliably
  • Good product taste as pertaining to human‑robot interaction, non‑verbal communication, and speech UX
Job Responsibility
Job Responsibility
  • Design the software architecture for real-time multimodal I/O
  • Design application flows like scheduling chores and triggering autonomous tasks from the voice interface
  • Optimize the companion stack for enabling seamless interactions with NEO
  • Make the Companion scalable and reliable while serving models from remote machines
What we offer
What we offer
  • Health, dental, and vision insurance
  • 401(k) with company match
  • Paid time off and holidays
  • Fulltime
Read More
Arrow Right