CrawlJobs Logo

AI/ML Infrastructure Engineer

techmahindra.com Logo

Tech Mahindra

Location Icon

Location:
Canada , Cambridge

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

115000.00 - 120000.00 USD / Year

Job Description:

We are looking for an experienced Infrastructure Engineer to design, automate, and operate scalable cloud infrastructure supporting data platforms and AI/ML workloads across GCP and Azure. This role focuses on Infrastructure such as Code, CI/CD automation, cloud networking, and enabling reliable, secure environments for data engineering and analytics teams.

Job Responsibility:

  • Design, provision, and manage cloud infrastructure using Terraform
  • Build and maintain CI/CD pipelines using Azure DevOps
  • Provision and manage GCP infrastructure, including compute, storage, IAM, and networking
  • Support and manage Azure infrastructure (VNets, networking, compute, storage)
  • Design and implement network provisioning (VPC/VNet architecture, routing, firewalls, load balancers, private connectivity)
  • Build and operate infrastructure for data platforms (data lakes, warehouses, streaming, analytics platforms)
  • Provision and support AI/ML infrastructure, including GPU resources and AI platforms
  • Implement security best practices, IAM, encryption, and compliance controls
  • Optimize infrastructure for performance, reliability, and cost
  • Collaborate with data engineering, analytics, and ML teams
  • Document infrastructure, architecture, standards, and operational runbooks

Requirements:

  • Strong experience with Terraform (Infrastructure as Code)
  • Experience with CI/CD pipelines, preferably Azure DevOps
  • Strong hands on experience with Google Cloud Platform (GCP)
  • Solid understanding of cloud networking and network provisioning
  • Experience supporting data platforms or large scale data workloads
  • Experience with AI/ML infrastructure
  • Strong Linux and scripting skills (Bash, Python, etc.)
  • A Bachelor’s or Higher Degree is the minimum entry required for the position

Nice to have:

  • Hands on experience with Azure infrastructure
  • Experience with Kubernetes (GKE / AKS)
  • Experience with data services such as BigQuery, Dataflow, Dataproc, Synapse, ADLS, Snowflake
  • Monitoring and observability tools (Prometheus, Grafana, Cloud Monitoring)
  • Multi cloud experience and relevant certifications
What we offer:
  • medical, vision, dental, life, disability insurance
  • paid time off (including holidays, parental leave, and sick leave, as required by law)

Additional Information:

Job Posted:
March 12, 2026

Expiration:
May 31, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for AI/ML Infrastructure Engineer

Software Engineer, Data Infrastructure

The Data Infrastructure team at Figma builds and operates the foundational platf...
Location
Location
United States , San Francisco; New York
Salary
Salary:
149000.00 - 350000.00 USD / Year
figma.com Logo
Figma
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of Software Engineering experience, specifically in backend or infrastructure engineering
  • Experience designing and building distributed data infrastructure at scale
  • Strong expertise in batch and streaming data processing technologies such as Spark, Flink, Kafka, or Airflow/Dagster
  • A proven track record of impact-driven problem-solving in a fast-paced environment
  • A strong sense of engineering excellence, with a focus on high-quality, reliable, and performant systems
  • Excellent technical communication skills, with experience working across both technical and non-technical counterparts
  • Experience mentoring and supporting engineers, fostering a culture of learning and technical excellence
Job Responsibility
Job Responsibility
  • Design and build large-scale distributed data systems that power analytics, AI/ML, and business intelligence
  • Develop batch and streaming solutions to ensure data is reliable, efficient, and scalable across the company
  • Manage data ingestion, movement, and processing through core platforms like Snowflake, our ML Datalake, and real-time streaming systems
  • Improve data reliability, consistency, and performance, ensuring high-quality data for engineering, research, and business stakeholders
  • Collaborate with AI researchers, data scientists, product engineers, and business teams to understand data needs and build scalable solutions
  • Drive technical decisions and best practices for data ingestion, orchestration, processing, and storage
What we offer
What we offer
  • equity
  • health, dental & vision
  • retirement with company contribution
  • parental leave & reproductive or family planning support
  • mental health & wellness benefits
  • generous PTO
  • company recharge days
  • a learning & development stipend
  • a work from home stipend
  • cell phone reimbursement
  • Fulltime
Read More
Arrow Right

Founding Infrastructure Engineer

As the first dedicated Infrastructure Engineer at Reducto, you will influence ev...
Location
Location
United States , San Francisco
Salary
Salary:
150000.00 - 300000.00 USD / Year
reducto.ai Logo
Reducto
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Have 5+ years of hands-on experience in building or supporting production-grade infrastructure and reliability processes for high-throughput systems
  • Are comfortable with Python or similar languages
  • Exceptional at working across cloud platforms, container orchestration (e.g., Kubernetes), networking, and storage technologies
  • Build your own tools on the fly to diagnose, experiment, and address reliability problems
  • Bring a quantitative, hands-on approach to system operations, automation, and continuous improvement
  • Are your own worst critic—have an extremely high bar for quality and always aim for robust solutions rather than quick fixes
Job Responsibility
Job Responsibility
  • Designing, building, and maintaining highly available, scalable infrastructure to support intensive AI/ML workloads and real-time model deployments
  • Implementing robust monitoring, alerting, and observability systems to ensure system health, performance, and uptime across cloud and on-prem environments
  • Debugging, optimizing, and automating infrastructure for fast iteration and rapid deployment cycles, focusing on both reliability and developer velocity
  • Proactively identifying, investigating, and resolving incidents to minimize downtime and maintain world-class service levels for enterprise customers
  • Collaborating closely with engineers, ML specialists, and founders to shape product, infrastructure, and security strategies
What we offer
What we offer
  • Unlimited PTO
  • Free lunch daily at the office
  • Reimbursed Transportation
  • Generous health insurance covering medical, dental, and vision
  • Health and Wellness Budget up to $150/mo reimbursement
  • Parental Leave
  • Fulltime
Read More
Arrow Right

Senior Data & AI/ML Engineer - GCP Specialization Lead

We are on a bold mission to create the best software services offering in the wo...
Location
Location
United States , Menlo Park
Salary
Salary:
Not provided
techjays.com Logo
techjays
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • GCP Services: BigQuery, Dataflow, Pub/Sub, Vertex AI
  • ML Engineering: End-to-end ML pipelines using Vertex AI / Kubeflow
  • Programming: Python & SQL
  • MLOps: CI/CD for ML, Model deployment & monitoring
  • Infrastructure-as-Code: Terraform
  • Data Engineering: ETL/ELT, real-time & batch pipelines
  • AI/ML Tools: TensorFlow, scikit-learn, XGBoost
  • Min Experience: 10+ Years
Job Responsibility
Job Responsibility
  • Design and implement data architectures for real-time and batch pipelines, leveraging GCP services such as BigQuery, Dataflow, Dataproc, Pub/Sub, Vertex AI, and Cloud Storage
  • Lead the development of ML pipelines, from feature engineering to model training and deployment using Vertex AI, AI Platform, and Kubeflow Pipelines
  • Collaborate with data scientists to operationalize ML models and support MLOps practices using Cloud Functions, CI/CD, and Model Registry
  • Define and implement data governance, lineage, monitoring, and quality frameworks
  • Build and document GCP-native solutions and architectures that can be used for case studies and specialization submissions
  • Lead client-facing PoCs or MVPs to showcase AI/ML capabilities using GCP
  • Contribute to building repeatable solution accelerators in Data & AI/ML
  • Work with the leadership team to align with Google Cloud Partner Program metrics
  • Mentor engineers and data scientists toward achieving GCP certifications, especially in Data Engineering and Machine Learning
  • Organize and lead internal GCP AI/ML enablement sessions
What we offer
What we offer
  • Best in class packages
  • Paid holidays and flexible paid time away
  • Casual dress code & flexible working environment
  • Medical Insurance covering self & family up to 4 lakhs per person
Read More
Arrow Right

Senior AI Infrastructure Engineer

This role will be responsible for designing, deploying, and maintaining high-per...
Location
Location
United States , Bothell; Overland Park; Bellevue
Salary
Salary:
113600.00 - 205000.00 USD / Year
https://www.t-mobile.com Logo
T-Mobile
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years technical engineering experience, preferably in multiple technology focus areas
  • Expert understanding of AI/ML infrastructure components, or GPU-based systems – preferably in a high-availability, large scale environment
  • Hands-on Experience with NVIDIA DGX servers, BasePOD architectures, and advanced GPU technologies
  • Proficient in Linux/UNIX environments, including scripting/automation tools (Bash, Python, Ansible, Terraform)
  • Understanding of AI infrastructure security best practices
  • Experience with container orchestration (Kubernetes, Docker) and GPU workload management tools
  • Strong knowledge of networking (InfiniBand/Ethernet) and storage solutions in AI/ML contexts
Job Responsibility
Job Responsibility
  • Technical System Expertise: Understands system protocols, how systems operate and data flows
  • Technical Engineering Services: Drives engineering projects by active contribution to the application of engineering techniques
  • Innovation: Contributes to designs to implement new ideas which improve an existing and new system/process/service
  • Technical Writing: Writes basic documentation on how technology works
  • Technical Leadership: Collaborates with technical teams and utilizes system expertise to deliver technical solutions
  • Technology Strategy: Contributes to new and existing technology options that support business goals
What we offer
What we offer
  • Competitive base salary and compensation package
  • Annual stock grant
  • Employee stock purchase plan
  • 401(k)
  • Access to free, year-round money coaches
  • Medical, dental and vision insurance
  • Flexible spending account
  • Paid time off
  • Paid holidays
  • Paid parental and family leave
  • Fulltime
Read More
Arrow Right

Principal AI/ML & Innovation Engineer

We are seeking Principal AI/ML & Innovation Engineer who will be leading initiat...
Location
Location
Puerto Rico , Aguadilla
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or master’s degree in computer science, engineering, data science, machine learning, artificial intelligence, or closely related quantitative discipline
  • Typically, 10-15 years’ experience
  • Solid understanding of fundamental AI and machine learning concepts, including supervised and unsupervised learning, deep learning, reinforcement learning, natural language processing, computer vision, and statistical modeling
  • Proficient in implementing and deploying various machine learning algorithms, such as decision trees, random forests, support vector machines, and neural networks
  • Knowledge of popular machine learning frameworks and libraries like TensorFlow, PyTorch, or sci-kit
  • Strong understanding of GitHub CoPilot, Cursor, N8N, vibe coding, Windsurf, and similar technologies
  • Experience in Cloud Infrastructure (AWS, Azure, etc)
  • Knowledge of Open Source, Linux, etc
  • Understanding of Devops, SRE
  • Expertise in deep learning techniques, architectures, and frameworks (e.g., convolutional neural networks (CNN), recurrent neural networks (RNN), generative adversarial networks (GAN), etc.)
Job Responsibility
Job Responsibility
  • Designing, developing, and deploying advanced machine learning models and algorithms
  • Leading research initiatives to explore novel approaches and technologies
  • Designing the architecture of AI systems and ensuring scalability, performance, and reliability
  • Collaborating with other teams, such as data scientists, software engineers, and product managers
  • Providing technical leadership and mentorship to junior engineers
  • Overseeing and guiding multiple design review sessions across different projects
  • Partnering with the engineering manager and team lead to establish long-term design and implementation strategies
  • Leading efforts to incorporate feedback loops and continuous improvement processes
  • Leading meetings, ensuring efficient progress tracking, issue resolution, and team coordination
  • Creating and delivering high-level presentations and reports to executive stakeholders
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

AI/ML Engineer - Public Sector

Unstructured is seeking an AI/Machine Learning Engineer to join our Public Secto...
Location
Location
United States
Salary
Salary:
Not provided
unstructured.io Logo
Unstructured
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related technical field. Master’s or PhD a plus
  • 4+ years of experience in AI/ML engineering, MLOPS, systems architecture, or similar technical roles
  • 2+ years of experience working with government networks and security requirements
  • An understanding of government security frameworks (FedRAMP, NIST 800-53, FISMA, DISA SRG) and how they apply to ML workloads
  • History of leading or delivering high-impact ML initiatives in enterprise or government environments
  • preference for those with articulable experience assessing performance of alternative models, architectures, and implementation strategies
  • A commitment to meeting the demanding engineering standards required to support national security and defense clients
  • A strong interest in being at the forefront of the AI revolution
  • TS Active Clearance required for the role + ability to travel
  • Familiar with AWS, Azure, and/or GCP services for ML workloads
Job Responsibility
Job Responsibility
  • Develop evaluation and assessment tools and frameworks to measure newly developed models for performance against key metrics across a wide domain of tasks and knowledge sets
  • Identify, propose, and implement modifications of existing models and model implementation frameworks to optimize for new tasks
  • Lead conceptualization of both traditional and agentic implementation strategies for cloud and on-premises model deployments within broader system architectures
  • Lead and optimize distributed ML workloads on multiple government cloud and non-cloud infrastructures
  • Align AI/ML deployments with FedRAMP, NIST 800-53, FISMA, and DISA SRG, maintaining strict security standards
  • Create reference architectures and deployment patterns to streamline ML adoption across government agencies
  • Translate mission objectives into ML-focused technical specifications and project plans
  • Apply advanced security controls and zero-trust architectures to protect ML pipelines and data
  • Continuously assess ML workloads for performance, cost, and security improvements, driving ongoing refinement
What we offer
What we offer
  • Competitive compensation, equity, and benefits
  • Fulltime
Read More
Arrow Right

Head of Platform Engineering

Lead and scale the Platform organization at Descript, which is central to empowe...
Location
Location
United States , San Francisco
Salary
Salary:
224000.00 - 296000.00 USD / Year
descript.com Logo
Descript
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of engineering management experience, including leading multiple teams or an engineering organization, preferably in platform or developer experience domain
  • Strong technical background with experience in cloud platforms (GCP preferably), scalable infrastructure, and AI/ML technologies
  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent professional experience
  • Ability to develop and communicate a clear vision
  • Experience creating collaborative, empowering, and high-performing team environments
  • Clear and effective communication across technical and non-technical audiences
  • Ability to thrive in fast-paced, rapidly changing environments and navigate ambiguity
Job Responsibility
Job Responsibility
  • Develop and execute a strategic vision and roadmap for the Platform organization
  • Recruit, mentor, and grow engineering managers and engineers
  • Ensure execution across the Platform teams is predictable, reliable, and sustainable
  • Work closely with Product, Design, and other Engineering teams
  • Drive innovation within the Platform organization
  • Help scale and evolve our company culture as we grow
What we offer
What we offer
  • Generous healthcare package
  • 401k matching program
  • Catered lunches
  • Flexible vacation time
  • Fulltime
Read More
Arrow Right

Senior Software Engineer II - AI/ML

As a Senior Software Engineer II at Aledade, we maintain, improve, and expand ou...
Location
Location
United States
Salary
Salary:
Not provided
aledade.com Logo
Aledade, Inc.
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS/BTech (or higher) in Computer Science, Engineering or a related field
  • 6+ years experience as an engineer building full-stack web applications as part of a cross-functional team
  • 3+ years of experience working with SQL or other database querying language on large multi-table data sets
  • 3+ years of experience acting as a trusted technical decision-maker in a team setting, solving for short-term and long-term business value
  • 3+ years of experience coaching other engineers
Job Responsibility
Job Responsibility
  • Develop and implement scalable and performant solutions
  • Partner, as a peer, with Engineering Managers, Product Managers, and stakeholders throughout Aledade to develop and execute technical roadmaps using Agile processes
  • Mentor and coach more junior engineers including thorough pull request reviews for other developers and be receptive to critical feedback on your own work
  • Improve AI/ML infrastructure for model development, training, and deployment, with a focus on large language models and other generative AI architectures
  • Design multi-year vision, shaping the direction of crucial generative AI areas - text generation, image synthesis, multimodal models, and personalized content creation
  • Architect systems to enhance the capabilities and relevance of AI models, making complex data sets more accessible and actionable
  • Design and implement prompt engineering strategies to effectively guide generative AI models
  • Work closely with Product Management, Practices, Sales, Customer Success, and other stakeholders to identify and prioritize applied AI use cases within the organization
  • Analyze product usage patterns and trends to make data-driven decisions and forecasts for generative AI applications
  • Maintain the security of protected patient health information and ensure compliance with relevant regulations in the context of AI
  • Fulltime
Read More
Arrow Right