Senior Machine Learning Infrastructure Engineer Job at PlusAI (Santa Clara)

Senior AI Infrastructure Engineer

This role will be responsible for designing, deploying, and maintaining high-per...

Location

United States , Bothell; Overland Park; Bellevue

Salary:

113600.00 - 205000.00 USD / Year

T-Mobile

Expiration Date

Until further notice

Requirements

5+ years technical engineering experience, preferably in multiple technology focus areas
Expert understanding of AI/ML infrastructure components, or GPU-based systems – preferably in a high-availability, large scale environment
Hands-on Experience with NVIDIA DGX servers, BasePOD architectures, and advanced GPU technologies
Proficient in Linux/UNIX environments, including scripting/automation tools (Bash, Python, Ansible, Terraform)
Understanding of AI infrastructure security best practices
Experience with container orchestration (Kubernetes, Docker) and GPU workload management tools
Strong knowledge of networking (InfiniBand/Ethernet) and storage solutions in AI/ML contexts

Job Responsibility

Technical System Expertise: Understands system protocols, how systems operate and data flows
Technical Engineering Services: Drives engineering projects by active contribution to the application of engineering techniques
Innovation: Contributes to designs to implement new ideas which improve an existing and new system/process/service
Technical Writing: Writes basic documentation on how technology works
Technical Leadership: Collaborates with technical teams and utilizes system expertise to deliver technical solutions
Technology Strategy: Contributes to new and existing technology options that support business goals

What we offer

Competitive base salary and compensation package
Annual stock grant
Employee stock purchase plan
401(k)
Access to free, year-round money coaches
Medical, dental and vision insurance
Flexible spending account
Paid time off
Paid holidays
Paid parental and family leave

Fulltime

Senior Principal Machine Learning Engineer - LLM Post-Training and Optimization

Atlassian is seeking a highly skilled and experienced Senior Principle Machine L...

Location

United States , Mountain View

Salary:

243100.00 - 407200.00 USD / Year

Atlassian

Expiration Date

Until further notice

Requirements

Ph.D. or Master’s degree in Computer Science, Machine Learning, Artificial Intelligence, or a related field
8+ years of experience in machine learning, with a focus on large-scale model development and optimization
Deep expertise in LLM and transformer architectures (e.g., GPT, BERT, T5)
Strong proficiency in Python and ML frameworks such as PyTorch, JAX, or TensorFlow
Experience with distributed training techniques and large-scale data processing pipelines
Proven track record of deploying machine learning models in production environments
Familiarity with model optimization techniques, including quantization, pruning, and knowledge distillation
Strong problem-solving skills and ability to work in a fast-paced, collaborative environment
Excellent communication skills and ability to translate technical concepts for diverse audiences

Job Responsibility

Lead the fine-tuning and post-training optimization of large language models (LLMs) for diverse applications
Develop and implement techniques for model compression, quantization, pruning, and knowledge distillation to optimize performance and reduce computational costs
Conduct research on advanced techniques in transfer learning, reinforcement learning, and prompt engineering for LLMs
Design and execute rigorous benchmarking and evaluation frameworks to assess model performance across multiple dimensions
Collaborate with infrastructure teams to optimize LLM deployment pipelines, ensuring scalability and efficiency in production environments
Stay at the forefront of advancements in LLM technologies, sharing insights, driving innovation within the team, and leading agile development
Mentoring other team members, facilitating within/across team workshops, fostering a culture of technical excellence and continuous learning

What we offer

health coverage
paid volunteer days
wellness resources

Fulltime

Senior Machine Learning Systems Engineer

Our team is building the foundations to democratise Machine Learning for Atlassi...

Location

India , Bengaluru

Salary:

Not provided

Atlassian

Expiration Date

Until further notice

Requirements

Fluency in at least one modern object-oriented programming language (preferably Java/Kotlin)
Understanding and experience with Machine Learning project lifecycle and tools
Understanding of LLMs, best deployment practices and inference optimisation
Experience in building and implementing high-performance RESTful micro-services
Experience building and operating large scale distributed systems using Amazon Web Services (Sagemaker, S3, Cloud Formation, AWS Security and Networking)
Experience with Continuous Delivery and Continuous Integration

Job Responsibility

Build and scale the core infrastructure to allow software engineers, ML engineers & data scientists to develop, train, evaluate, deploy, and operate Machine Learning models and pipelines
Build systems for product teams like Jira & Confluence to provide access to curated LLMs
Use software development expertise to solve difficult problems, tackling infrastructure and architecture challenges
Lead engineers to drive involved projects from technical design to launch
Collaborate with other teams and internal customers to set expectations, gather input and communicate results

What we offer

Health coverage
Paid volunteer days
Wellness resources

Fulltime

Senior Machine Learning Engineer

As a Senior Machine Learning Engineer in the Central AI team, you will build and...

Location

Australia , Sydney

Salary:

Not provided

Atlassian

Expiration Date

Until further notice

Requirements

Master or PhD in a quantitative subject (Statistics, Mathematics, Computer Science, Operations Research, or relevant work experience)
3+ years of related industry experience in the data science domain
Expertise in Python or Java with and the ability to write performant production-quality code, familiarity with SQL, knowledge of Spark and cloud data environments (e.g. AWS, Databricks)
Experience building and scaling machine learning models in business applications using large amounts of data
Ability to communicate and explain data science concepts to diverse audiences, craft a compelling story
Focus on business practicality and the 80/20 rule
very high bar for output quality, but recognize the business benefit of "having something now" vs "perfection sometime in the future"
Agile development mindset, appreciating the benefit of constant iteration and improvement

Job Responsibility

Build and maintain the core infrastructure to allow machine learning engineers and data scientists to develop, train, evaluate, deploy, and operate Machine Learning models and pipelines
Use software development expertise to solve difficult problems, tackling complex infrastructure and architecture challenges
Design system and model architectures, conducting rigorous experimentation and model evaluations, and providing guidance to junior ML engineers
Lead other engineers to drive involved projects from technical design to launch
Collaborate with other teams and internal customers to set expectations, gather input and communicate results

What we offer

Health and wellbeing resources
Paid volunteer days

Fulltime

Senior AI and Machine Learning Engineer

We are seeking Senior AI/ML & Innovation Engineer who will be leading initiative...

Location

United States , Aguadilla

Salary:

Not provided

Hewlett Packard Enterprise

Expiration Date

Until further notice

Requirements

Bachelor's or master’s degree in computer science, engineering, data science, machine learning, artificial intelligence, or closely related quantitative discipline
Typically, 7-10 years’ experience
Deep understanding of machine learning algorithms, such as linear regression, decision trees, support vector machines, random forests, deep learning models (e.g., neural networks), and reinforcement learning
A strong foundation in mathematics and statistics
Proficiency in programming languages such as Python, R, or Java
Strong understanding of GitHub CoPilot, Cursor, N8N, vibe coding, Windsurf, and similar technologies
Experience in Cloud Infrastructure (AWS, Azure, etc)
Knowledge of Open Source, Linux, etc
Understanding of Devops, SRE
Advanced knowledge and experience in deep learning

Job Responsibility

Conducts research and stays up to date with the latest advancements in AI and machine learning technologies, frameworks, and algorithms
Collaborates with cross-functional teams to understand business requirements and design AI and machine learning solutions
Develops, implements, and optimizes machine learning models and algorithms
Deploys machine learning models into production environments
Monitors the performance of deployed models
Organizes and leads comprehensive design review sessions
Works collaboratively with the engineering manager and team lead to set design and implementation standards
Regularly leads meetings
Has experience in providing technical leadership, mentorship, and guidance to junior team members
Develops and delivers strategic presentations and reports to senior stakeholders

What we offer

Health & Wellbeing
Personal & Professional Development
Unconditional Inclusion

Fulltime

Senior Staff Machine Learning Engineer (AI Agent)

At Cresta, the AI Agent team is on a mission to create state-of-the-art AI Agent...

Location

United States; Canada

Salary:

Not provided

Cresta

Expiration Date

Until further notice

Requirements

Bachelor’s Degree in Computer Science, Mathematics, or a related field
Master’s or Ph.D. preferred, or equivalent professional experience
7+ years of hands-on industry experience with AI and machine learning
3+ years of experience working with LLMs in large-scale production environments
Expert knowledge of machine learning concepts and methods, especially those related to NLP, Generative AI, and working with LLMs
Proven leadership in designing and deploying AI solutions at scale
Extensive practical knowledge of modern machine learning frameworks and technologies (e.g., PyTorch, Tensorflow, Hugging Face, NumPy)
Experience with distributed systems and cloud-based AI infrastructure
Strong problem-solving and strategic thinking abilities
Proven ability to lead cross-functional teams and work collaboratively to deliver innovative AI solutions in production

Job Responsibility

Design, develop, and deploy Cresta’s AI Agent solutions and proprietary models
Focus on practical AI challenges such as improving reasoning, planning capabilities, and evaluation in real-world scenarios
Collaborate with cross-functional teams including front-end and back-end software engineers to integrate AI Agents into Cresta’s customer solutions
Lead initiatives to scale AI systems for production environments, ensuring performance and reliability across use cases
Contribute to solving cutting-edge problems in AI and help define the future roadmap for Cresta’s AI Agents
Innovate and research ways to improve security, cost-efficiency, and reliability of AI systems

What we offer

Variety of medical, dental, and vision plans
Paid parental leave
Monthly Health & Wellness allowance
Work from home office stipend
Lunch reimbursement for in-office employees
PTO: 3 weeks in Canada
Base salary, equity, and a variety of benefits

Fulltime

Senior Machine Learning Engineer, Personalization and Recommendations

As a Senior Machine Learning Engineer on the Personalization & Recommendations t...

Location

United States , San Francisco

Salary:

183360.00 - 248000.00 USD / Year

EdTech Jobs

Expiration Date

Until further notice

Requirements

5+ years of experience in applied machine learning or ML-heavy software engineering, with a strong focus on personalization, ranking, or recommendation systems
Demonstrated impact improving key metrics such as CTR, retention, or engagement through recommender or search systems in production
Strong hands-on skills in Python and PyTorch, with expertise in data and feature engineering, distributed training and inference on GPUs, and familiarity with modern MLOps practices — including model registries, feature stores, monitoring, and drift detection
Deep understanding of retrieval and ranking architectures, such as Two-Tower models, deep cross networks, Transformers, or MMoE, and the ability to apply them to real-world problems
Experience with large-scale embedding models and vector search, including FAISS, ScaNN, or similar systems
Proficiency in experiment design and evaluation, connecting offline metrics (AUC, NDCG, calibration) with online A/B test outcomes to drive product decisions
Clear, effective communication, collaborating well with product managers, data scientists, engineers, and cross-functional partners
A growth and mentorship mindset, helping elevate team quality in modeling, experimentation, and reliability
Commitment to responsible and inclusive personalization, ensuring our systems respect learner privacy, fairness, and diverse goals

Job Responsibility

Design and implement personalization models across candidate retrieval, ranking, and post-ranking layers, leveraging user embeddings, contextual signals and content features
Develop scalable retrieval and serving systems using architectures such as Two-Tower models, deep ranking networks, and ANN-based vector search for real-time personalization
Build and maintain model training, evaluation, and deployment pipelines, ensuring reliability, training–serving consistency, observability, and robust monitoring
Partner with Product and Data Science to translate learner objectives (engagement, retention, mastery) into measurable modeling goals and experiment designs
Advance evaluation methodologies, contributing to offline metric design (e.g., NDCG, CTR, calibration) and supporting rigorous A/B testing to measure learner and business impact
Collaborate with platform and infrastructure teams to optimize distributed training, inference latency, and serving cost in production environments
Stay informed on industry and research trends, evaluating opportunities to meaningfully apply them within Quizlet’s ecosystem
Mentor junior and mid-level engineers, supporting technical growth, experimentation rigor, and responsible ML practices
Champion collaboration, inclusion, curiosity, and data-driven problem solving, contributing to a healthy and productive team culture

What we offer

20 vacation days
Competitive health, dental, and vision insurance (100% employee and 75% dependent PPO, Dental, VSP Choice)
Employer-sponsored 401k plan with company match
Access to LinkedIn Learning and other resources to support professional growth
Paid Family Leave, FSA, HSA, Commuter benefits, and Wellness benefits
40 hours of annual paid time off to participate in volunteer programs of choice

Fulltime

Senior Staff Machine Learning Engineer

Help design our AI platform and develop our next generation of machine learning ...

Location

United States , San Francisco

Salary:

216500.00 - 324500.00 USD / Year

GoFundMe

Expiration Date

Until further notice

Requirements

9+ years of hands-on experience in machine learning engineering, AI development, software engineering, or related fields
Experience emphasizing secure, large-scale, distributed system design, AI/ML pipeline development, and implementation
Extensive experience designing, developing, and operating scalable backend systems
Experience applying software engineering best practices such as domain-driven design, event-driven architectures, and microservices
Deep expertise in agentic workflows, AI evaluation solutions, prompt management, and secure AI development and testing practices
Strong knowledge of relational and document-based databases, data storage paradigms, and efficient RESTful API design
Experience establishing robust CI/CD pipelines, automated testing (unit and integration), and deployment practices
Strong leadership skills, including effective planning and management of complex projects, mentoring of team members, and fostering a collaborative, high-performing engineering culture
Excellent communicator, able to articulate complex technical concepts clearly to both technical and non-technical stakeholders
Bachelor's degree in Computer Science, Software Engineering, or a related technical field (preferred)

Job Responsibility

Design and implement AI platforms to enable scalable and secure access to LLMs from multiple model providers for diverse use cases
Design and implement agentic workflows, agentic tool ecosystems, and LLM prompt management solutions
Design, build, and optimize scalable model training, fine tuning, and inference pipelines, ensuring robust integration with production systems
Influence technical strategy and approach to developing embedding stores, vector databases, and other reusable assets
Lead initiatives to streamline ML and AI workflows, improve operational efficiency, and establish standardized procedures to achieve consistent, high-quality results across our AI systems
Design and develop backend services and RESTful APIs using Python and FastAPI, integrating seamlessly with ML pipelines and services
Take operational responsibility for team-owned services, including performance monitoring, optimization, troubleshooting, and participation in an on-call rotation
Collaborate with both technical and non-technical colleagues, including data and applied scientists, software engineers, product managers, and business stakeholders, to deliver reliable and scalable ML-driven products
Coach and mentor fellow ML engineers, promoting a culture of collaboration, continuous improvement, and engineering excellence within the team
Employ a diverse set of tools and platforms including Python, AWS, Databricks, Docker, Kubernetes, FastAPI, Terraform, Snowflake, Coralogix, and GitHub to build, deploy, and maintain scalable, highly available machine learning infrastructure

What we offer

Competitive pay
Comprehensive healthcare benefits
Financial assistance for things like hybrid work, family planning
Generous parental leave
Flexible time-off policies
Mental health and wellness resources
Learning, development, and recognition programs

Fulltime

Senior Machine Learning Infrastructure Engineer

PlusAI

Location:
United States , Santa Clara

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
December 11, 2025

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Senior Machine Learning Infrastructure Engineer

Senior AI Infrastructure Engineer

Senior Principal Machine Learning Engineer - LLM Post-Training and Optimization

Senior Machine Learning Systems Engineer

Senior Machine Learning Engineer

Senior AI and Machine Learning Engineer

Senior Staff Machine Learning Engineer (AI Agent)

Senior Machine Learning Engineer, Personalization and Recommendations

Senior Staff Machine Learning Engineer

Senior Machine Learning Infrastructure Engineer

PlusAI

Location:United States , Santa Clara

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:December 11, 2025

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Senior Machine Learning Infrastructure Engineer

Senior AI Infrastructure Engineer

Senior Principal Machine Learning Engineer - LLM Post-Training and Optimization

Senior Machine Learning Systems Engineer

Senior Machine Learning Engineer

Senior AI and Machine Learning Engineer

Senior Staff Machine Learning Engineer (AI Agent)

Senior Machine Learning Engineer, Personalization and Recommendations

Senior Staff Machine Learning Engineer

Location:
United States , Santa Clara

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
December 11, 2025