Multimodal Speech Engineer Job at 1X Technologies (Palo Alto)

Multimodal Speech Engineer, AI Companion

As a Multimodal Speech Engineer on the AI Companion Team, you will lead the deve...

Location

United States , Palo Alto

Salary:

150000.00 - 250000.00 USD / Year

1X Technologies

Expiration Date

Until further notice

Requirements

3+ years of experience in speech and audio modeling domains
Experience with multi-modal conversational models (language, audio, vision)
Ability to take open-ended problems in conversation modeling, develop creative solutions, build proof-of-concepts, and scale them to production

Job Responsibility

Design and implement data pipelines for large-scale speech interactions using internal and external datasets
Train speech-to-speech models that incorporate awareness of NEO’s physical form
Create dynamic responses for a wide range of user queries
Synchronize NEO’s speech with physical gestures and body language
Customize NEO’s speech behavior to reflect different personalities

What we offer

Equity
Health, dental, and vision insurance
401(k) with company match
Paid time off and holidays

Fulltime

New

AI Engineer - Speech & NLP

Interhuman AI is building the next generation of social intelligence infrastruct...

Location

Denmark , København

Salary:

55000.00 - 65000.00 DKK / Year

Life Science Talent

Expiration Date

Until further notice

Requirements

PhD in Machine Learning, Computer Science, or a related field with a focus on speech processing and/or NLP
Track record of building and shipping models
Strong proficiency in Python and deep experience with PyTorch (or JAX/TensorFlow)
Familiarity with the current landscape of speech and multimodal models (e.g., Whisper, audio-LLMs, speech encoders, vision-language models)
You thrive with ambiguity. You can scope your own work, prioritize ruthlessly, and know when to ask for input
Clear communicator—you can explain a complex architecture to both engineers and non-technical stakeholders

Job Responsibility

Design, train, and iterate on speech and language models that extract social and emotional signals from live conversation
Own the full model development lifecycle—from data curation and architecture design through training, evaluation, and production deployment
Build evaluation frameworks and benchmarks that capture the subtleties of human interaction that standard metrics miss
Stay at the frontier of multimodal research and translate relevant advances into our production stack
Collaborate closely with engineering to ensure models meet real-time latency and scalability requirements

What we offer

Competitive salary and meaningful equity in an early-stage, venture-backed company
Direct influence on technical direction—your work shapes the product, not just a feature
A small, focused team where your contributions are visible and impactful from day one
Flexibility on location and working arrangements

Fulltime

Research Intern - GenAI

Appen is seeking Research Interns to support innovative research in Generative A...

Location

Australia , Chatswood, Sydney

Salary:

Not provided

Appen

Expiration Date

Until further notice

Requirements

Postgraduate students in Linguistics, Computer Science, AI, Data Science, or similar disciplines preferred
strong final-year and recent undergraduate candidates in these fields will also be considered
Familiarity with programming languages such as Python, R, or similar tools used in data analysis and machine learning
Experience with data annotation, model evaluation, or prompt engineering
Understanding of multilingual NLP, speech technologies, or agentic AI systems
Strong written communication skills, especially for summarizing research and drafting technical content
Ability to work independently and collaboratively in a remote research environment

Job Responsibility

Conduct literature reviews on topics such as adversarial prompting, multilingual evaluation, and agentic AI
Assist in dataset curation, annotation, and quality assurance for speech, text, and multimodal data
Support model evaluation experiments, including prompt engineering and red teaming
Develop scripts and tools for data analysis, visualization, and automation
Contribute to internal documentation, research reports, and thought leadership content
Participate in team meetings and cross-functional collaborations
Help prepare materials for conferences, publications, and workshops

What we offer

Hands-on experience in applied AI research with real-world impact
Mentorship from experienced researchers and exposure to industry workflows
Opportunities to contribute to publications, datasets, and thought leadership
A collaborative and inclusive research environment

Director, Digital Ecosystem Applications

This position is responsible for the Software Platforms group at the Innovation ...

Location

United States , Belmont

Salary:

240000.00 - 285000.00 USD / Year

Volkswagen AG

Expiration Date

Until further notice

Requirements

10+ years with 2+ years in a technical leadership role
CS, EE, M.S. Engineering (or equivalent) REQUIRED
M.S. Engineering (or equivalent) or PhD PREFERRED
Analytical and conceptual thinking – using logic and reason, creative and strategic
Communication skills – interpersonal, presentation and written
Managing interdisciplinary teams on individual projects
Integration – joining people, processes or systems
Influencing and negotiation skills
Problem solving
Resource management

Job Responsibility

Define the technical mission, architecture strategy, and long‑term platform vision for the In‑Vehicle Computing & Digital Ecosystem Applications team, spanning Android Automotive OS (AAOS), in‑vehicle compute platforms, Software‑Defined Vehicle (SDV) architecture, and AI‑driven cockpit intelligence
Provide technical leadership across the full software stack, including Android Framework, System Services, HAL layers, middleware, connectivity stacks, media/audio frameworks, HMI toolchains, and cloud‑connected AI runtimes within an SDV‑aligned architecture
Lead and mentor engineering teams in platform bring‑up, system integration, performance optimization, and development of AI‑agentic features, multimodal interaction models, and next‑generation speech technologies
Manage multi‑year budgets for platform development, AI integration, SDV‑aligned compute evolution, SoC evaluations, cloud services, and prototype programs
Deliver executive‑level technical reporting on architecture decisions, platform readiness, SDV integration milestones, AI progress, risks, and strategic recommendations
Drive strategic planning for ICC’s infotainment and cockpit portfolio, including AAOS evolution, hybrid cloud/edge AI pipelines, intelligent mobile agent technologies, and SDV‑centric software and compute roadmaps
Align technical roadmaps with global VW Group Innovation teams across infotainment, connectivity, AI/ML, vehicle architecture, cloud services, and SDV platform strategy, ensuring cross‑platform consistency and shared component reuse
Build strategic relationships with SoC vendors, Tier‑1 suppliers, cloud providers, and AI technology partners to influence cockpit compute and SDV platform evolution
Maintain partnerships with Silicon Valley companies specializing in AI runtimes, LLMs, speech, multimodal interaction, and automotive‑grade SDV‑compatible software frameworks
Collaborate with academic and research institutions on AI‑agentic systems, embedded ML, HMI, and in‑vehicle compute architectures aligned with SDV principles

What we offer

Eligibility for annual performance bonus
Healthcare benefits
401(k), with company match
Defined contribution retirement program
Tuition reimbursement
Company lease car program
Paid time off

Fulltime

New

Senior Software Engineer - Product

Frontier Foundry (F²) is IDC’s bold new innovation engine — a design-led, full-s...

Location

India , Hyderabad

Salary:

Not provided

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 7+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Hands-on experience with Nuance voice technologies or similar platforms (e.g., Azure Speech, Dialogflow, Alexa Skills Kit)
Deep understanding of Voice Access systems, accessibility APIs, and assistive technologies
Strong proficiency in full-stack development, especially client-side application engineering and user-facing experiences
Experience with GitHub Copilot, Copilot Studio, AI Foundry, or equivalent vibe coding/generative AI tools
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter

Job Responsibility

Design, build, and deliver high-quality software components aligned to F² charters: interaction models (inking, stylus, display tech), multimodal innovation (sensor fusion, voice/touch interfaces), or AI agents (context-aware, task-oriented)
Integrate and optimize Nuance Conversational AI technologies (e.g., speech-to-text, text-to-speech, NLU) into multimodal experiences
Enhance Voice Access capabilities across platforms, ensuring accessibility, responsiveness, and seamless user interaction
Work across the stack — from UI to backend — with a bias for impact and iteration
Embrace “vibe coding” using AI-assisted tools like GitHub Copilot, Copilot Studio, AI Foundry, and other generative AI tools to reduce boilerplate and drive intelligent test automation
Collaborate with product, design, and partner teams to shape backlog priorities and deliver intuitive, high-impact experiences
Navigate evolving priorities with ingenuity, turning loosely defined ideas into tangible software outcomes
Contribute to architecture discussions, code reviews, and prototyping efforts
Foster a culture of agility, experimentation, and outcome-driven development

Fulltime

Senior Research Scientist

PolyAI automates customer service through lifelike voice assistants that let cus...

Location

United Kingdom , London

Salary:

Not provided

PolyAI

Expiration Date

Until further notice

Requirements

PhD in Machine Learning, Natural Language Processing, Computer Science, or a related field
5+ years of hands-on experience in deep learning
Proven track record of research innovation, including published work or deployed systems
Strong programming skills in Python and deep learning frameworks like PyTorch
Demonstrated expertise in at least one domain area such as reinforcement learning, conversational AI, audio modelling, or LLM alignment
Experience leading projects end-to-end, from ideation to deployment
Excellent communication skills with the ability to write clear technical documents and explain complex concepts to diverse audiences
Comfortable working in ambiguity and driving clarity through experimentation and data

Job Responsibility

Lead and execute complex research projects with clear business impact
Design and implement novel post-training strategies including preference tuning, reward modeling, and synthetic supervision
Develop innovative model architectures and training approaches for conversational AI, including speech-aware and multimodal models
Conduct empirical studies to assess model performance in live deployments and iterate quickly based on real-world data
Generate, collect, and annotate training data - including synthetic and real-world conversational datasets - with an eye for quality and bias mitigation
Design robust evaluation metrics and benchmarks for LLM-based assistants in customer service domains
Work closely with engineering and product teams to integrate research into production environments
Collaborate with legal and compliance teams to ensure responsible use of data and models
Stay current with academic and industry advances in LLMs, ASR, TTS, RLHF, and multimodal learning

What we offer

Participation in the company’s employee share options plan
Tenure-Based PTO: You will receive 25 holidays when you join and will gain an additional 1 day after 2 years of service, then 1 day each year until capped at 32 holidays
Flexible working from home policy
Work from outside of the UK for up to 6 months each year
TELUS Health EAP 24/7 - offers you and your chosen family confidential, judgment-free support for any work, health, or life challenge
Enhanced parental leave
Bike2Work scheme
Annual learning and development allowance
We’re all about making WFH work for you - that’s why we offer a one-off WFH allowance when you join. Offering perks like noise-cancelling headphones or a comfortable desk chair to boost your comfort and focus!
Company-funded fertility and family-forming programmes

Principal Software Engineer, CoreAI

Join Microsoft’s AI Core team building high performance runtime systems that ser...

Location

United States , Redmond

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

6+ years of experience in systems programming with strong expertise in C++
Proven experience building, deploying, and operating scalable cloud services
Strong debugging skills and experience using performance profiling and diagnostic tools
Hands-on experience with distributed systems, Kubernetes, and containerized workloads
Experience with largescale LLM inferencing infrastructure, including CUDA
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Job Responsibility

Design and implement high performance microservices and runtime components in C++
Optimize AI inferencing systems for latency, throughput, cost, and reliability at large scale
Debug and resolve complex production issues related to performance, scaling, and service reliability
Collaborate with cross-functional partners to integrate model inference pipelines into scalable infrastructure
Contribute to state-of-the-art multimodal inferencing systems supporting text, speech, and vision workloads
Drive systems level innovations for realtime and batch inferencing efficiency
Participate in code reviews and provide technical mentorship to senior and peer engineers

Fulltime

Full-Stack Engineer, AI Companion

The AI Companion team creates the speech interface to NEO, as well as the physic...

Location

United States , Palo Alto

Salary:

150000.00 - 250000.00 USD / Year

1X Technologies

Expiration Date

Until further notice

Requirements

4+ years of experience with C++
4+ years of experience with Python
4+ years of experience with Bazel
4+ years of experience with PyTorch
Experience with real‑time or streaming model architectures or systems
Product obsession with quality, performance, and design taste
Ability to take research ideas into production systems that work reliably
Good product taste as pertaining to human‑robot interaction, non‑verbal communication, and speech UX

Job Responsibility

Design the software architecture for real-time multimodal I/O
Design application flows like scheduling chores and triggering autonomous tasks from the voice interface
Optimize the companion stack for enabling seamless interactions with NEO
Make the Companion scalable and reliable while serving models from remote machines

What we offer

Health, dental, and vision insurance
401(k) with company match
Paid time off and holidays

Fulltime

Multimodal Speech Engineer

1X Technologies

Location:
United States , Palo Alto

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
December 01, 2025

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Multimodal Speech Engineer

Multimodal Speech Engineer, AI Companion

AI Engineer - Speech & NLP

Research Intern - GenAI

Director, Digital Ecosystem Applications

Senior Software Engineer - Product

Senior Research Scientist

Principal Software Engineer, CoreAI

Full-Stack Engineer, AI Companion

Multimodal Speech Engineer

1X Technologies

Location:United States , Palo Alto

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:December 01, 2025

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Multimodal Speech Engineer

Multimodal Speech Engineer, AI Companion

AI Engineer - Speech & NLP

Research Intern - GenAI

Director, Digital Ecosystem Applications

Senior Software Engineer - Product

Senior Research Scientist

Principal Software Engineer, CoreAI

Full-Stack Engineer, AI Companion

Location:
United States , Palo Alto

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
December 01, 2025