Senior Technical Program Manager – AI Infrastructure, Site Operations Job at Cerebras Systems (Sunnyvale)

New

Technical Program Manager, AI Infrastructure

Be part of the team that builds and operates the world's fastest AI infrastructu...

Location

United States , Sunnyvale

Salary:

Not provided

Cerebras Systems

Expiration Date

Until further notice

Requirements

Experience leading large, cross-functional infrastructure programs
Experience with AI/ML, HPC, or accelerator-based infrastructure
Strong understanding of data center power and cooling fundamentals
Experience installing and managing network, storage, and compute devices
Proven ability to define and operationalize metrics
Strong written and executive-level communication skills
Experience working with colocation providers and facilities teams
Background in incident management, reliability, or service operations

Job Responsibility

Own end-to-end technical programs for multiple data center buildouts, coordinating with partners, contractors, and internal teams
Drive facility site readiness for power and cooling for Cerebras Wafer-Scale Engine systems
Coordinate equipment delivery and manage vendor accountability for schedules and quality related to rack integration and inter-rack cabling
Act as the single-threaded owner across internal partners: Hardware & Systems Engineering, Network & Storage Engineering, AI Cloud Infrastructure & Operations
Enforce handover criteria between site completion, equipment deployment, and operations
Own overall schedule tracking, risk identification, and mitigation, creating clear visibility for leadership
Establish program governance, risk tracking, and RACI clarity
Present program status, metrics, and operational risks to senior leadership
Drive partner accountability on contractual milestones and commercial commitments
Document repeatable processes and implement them to scale across future data centers

What we offer

Build a breakthrough AI platform beyond the constraints of the GPU
Publish and open source their cutting-edge AI research
Work on one of the fastest AI supercomputers in the world
Enjoy job stability with startup vitality
Our simple, non-corporate work culture that respects individual beliefs

Senior Technical Program Manager - Datacenter Infrastructure

The Datacenter leasing Senior Technical Program Manager will be part of a team r...

Location

Singapore , Singapore

Salary:

Not provided

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Civil, Electrical, Mechanical, Telecom Engineering, or related technical field AND 4+ years’ experience in engineering, operations, commissioning or technical program management
3+ years’ experience managing cross functional and/or cross-team projects
3+ years of experience in data center design, infrastructure, and critical environments
Broad infrastructure knowledge across mechanical, electrical, and controls systems with a focus on Datacenter integration and performance
Familiarity with key industry standards and best practices, including ASHRAE, Uptime Institute, ANSI, and NFPA
Familiarity with high-density power and cooling solutions, sustainability initiatives, and emerging technologies for AI workloads
Ability to meet Microsoft, customer and/or government security screening requirements

Job Responsibility

Act as a Subject Matter Expert (SME) and provide global program support
Drive technical solutions for leased datacenters in partnership with Microsoft’s and Lessor’s core engineering teams
Evaluate lessor’s design proposal against technical requirements and mitigate non-compliance through technical and commercial solutions
Assesses lessor’s compliance through review of technical documents, site assessments, and stakeholder engagement
Partner with internal and external stakeholders during construction, RFS, and operations handover to unblock any technical issues risking the on-time delivery of Datacenter to customers
Drive cost impact analysis on non-compliance and specification changes. Escalate and provide visibility and feedback to leadership on cost drivers
Partner with Microsoft Engineering, Integration, Security, Operations, and Energy teams on resolution management
Drive partner accountability on contractual milestones and commercial commitments
Own overall schedule tracking, risk identification, blockers, and mitigation for the assigned projects
creating clear visibility for leadership

Fulltime

Engineering Director

We are seeking a seasoned Engineering Director who thrives in challenging and fa...

Location

Puerto Rico , Aguadilla

Salary:

Not provided

Hewlett Packard Enterprise

Expiration Date

Until further notice

Requirements

Significant work experience as a director or similar position working across multiple stakeholder organizations, with at least 10+ years of people leadership experience specific to SW and Cloud engineering
Solid experience leading SW development across storage, networking, on-prem, and SaaS is a must
Experience in setting up geographically distributed sites
Must have a strong background in software development lifecycle including cloud infrastructure
Familiarity with agile methodologies and tools like JIRA
Prior experience in cloud product development and deployments
end to end ownership and accountability
Solid understanding of fundamental AI and machine learning concepts, including supervised and unsupervised learning, deep learning, reinforcement learning, natural language processing, computer vision, and statistical modeling
Extensive business acumen, technical knowledge, and industry experience encompassing one or more engineering, technology, and product domains
Demonstrated abilities to drive transformation across a business with exceptional skills in the management of change

Job Responsibility

Oversee the Puerto Rico Site daily operations, strategic planning and cross-functional team leadership for Hybrid Cloud
Recruit, mentor, and manage teams of AI/ML engineers, QA Engineers, Design Engineers and innovation specialists to deliver cutting-edge solutions
Continuously evaluate new tools, platforms, and frameworks in AI/ML to drive competitive advantage and operational efficiency
Ensure alignment with corporate goals while fostering a high-performance culture, operational efficiency, and employee engagement
Lead the development and execution of AI/ML strategies that align with business goals and drive innovation across products, services, or operations
Create strategic and tactical operations and resource plans, goals, and priorities for assigned organization based on business and technology roadmap and functional objectives
Engage with various senior leaders across the organization, program managers, R&D, support, Quality, product managers, technical leaders and executives to communicate program status, escalate issues, and guide and influence strategic decision-making
Manage senior relationships and escalated issues with outsourced partners and suppliers, including setting expectations regarding deliverables, product quality, schedules, and costs
ensures that organization is effectively leveraging outsourced resources
Identify opportunities for and drive organizational initiatives and programs to support business process improvements and cost reductions

What we offer

Health & Wellbeing
Personal & Professional Development
Unconditional Inclusion

Fulltime

New

Senior Director, Critical Environments (Lab Operations)

We are seeking an industry veteran to serve as the Senior Director, Critical Env...

Location

Taiwan , New Taipei City

Salary:

Not provided

JLL

Expiration Date

Until further notice

Requirements

20+ years of progressive experience in Critical Environments (Data Centers, Semiconductor, Pharma, or R&D Labs), covering operations, engineering, planning, and innovation
15+ years of direct people management experience, specifically leading large technical teams (50-100+ staff) and 'managing managers' in a multi-site, matrixed environment
Bachelor’s degree in Engineering (Mechanical/Electrical), Facilities Management, or a related technical field is required
A Master’s degree or MBA is highly preferred
Professional Engineer (PE), Certified Facility Manager (CFM), or PMP is preferred

Job Responsibility

Executive Leadership & Organizational Strategy: Manage and mentor a high-performing organization of 100+ staff members through direct supervision of five specialized Directors
Foster a 'No Ego' culture of accountability and collaboration across diverse teams
Serve as the primary strategic partner to senior client stakeholders
Present complex technical and data concepts as clear business strategies to the C-Suite
Define the competency requirements and training standards for the entire critical environments organization
Operational Resilience & 24/7 Command: Oversee the Director of Critical Operations and Senior Director of Engineering & Ops Center to ensure 100% uptime in critical operations
Serve as the ultimate escalation point for major incidents
Lead executive communication, mitigation strategy, and systemic Root Cause Analysis (RCA)
Direct the strategy of the 24/7 Operations Center
Technical Governance & Engineering Excellence: Oversee comprehensive design reviews for MEP (Mechanical, Electrical, Plumbing) topology

Fulltime

Senior Site Reliability Engineer

Zuora’s Cloud Engineering organization owns the reliability, scalability, and op...

Location

India , Chennai

Salary:

Not provided

Zuora

Expiration Date

Until further notice

Requirements

8+ years of hands-on experience in Site Reliability Engineering, DevOps, or large-scale production operations
Advanced expertise in AWS, including architecture design across services such as EC2, EKS, VPC, IAM, RDS, S3, and CloudWatch
Deep experience with Infrastructure-as-Code using Terraform, including complex modules, state management, and governance
Strong programming and automation skills using Python and Shell
experience building production-grade automation systems
Expert-level Linux systems knowledge, including performance tuning, security hardening, and deep troubleshooting
Proven experience operating distributed systems and data streaming platforms such as Kafka in high-throughput environments
Demonstrated ability to work independently on complex, ambiguous problems with broad organizational impact
Proven technical leadership experience driving large, cross-team reliability or infrastructure initiatives, including setting technical direction, influencing design decisions, and mentoring engineers to deliver measurable outcomes at scale
Practical experience designing or implementing AI/ML-driven automation in operations, reliability, or platform engineering

Job Responsibility

Reliability Architecture & Platform Strategy: Own and evolve the reliability architecture of large-scale, distributed SaaS systems by defining SLOs, SLIs, error budgets, and resilience patterns aligned with business objectives
AI-Driven Automation & Intelligent Operations: Design, build, and operationalize AI-powered automation to reduce operational toil and improve system stability
Advanced Cloud & Infrastructure Engineering: Lead the design and operation of complex AWS-based infrastructure and Kubernetes platforms, optimizing for availability, security, and cost efficiency
Incident Leadership & Operational Excellence: Act as a technical leader during high-severity production incidents, driving structured response, decision-making, and recovery
Technical Leadership & Cross-Functional Influence: Influence reliability outcomes beyond the SRE team by partnering closely with Engineering, Product, and Security stakeholders

What we offer

Competitive compensation, variable bonus and performance reward opportunities, and retirement programs
Medical Insurance
Generous, flexible time off
Paid holidays, “wellness” days and company wide end of year break
6 months fully paid parental leave
Learning & Development stipend
Opportunities to volunteer and give back, including charitable donation match
Free resources and support for your mental wellbeing

Fulltime

New

Senior Backend Software Engineer, Cloud Management

We are seeking talented Senior Software Engineers to design, build, and scale Cr...

Location

United States , San Francisco; Sunnyvale

Salary:

175000.00 - 210000.00 USD / Year

Crusoe

Expiration Date

Until further notice

Requirements

5+ years of software development experience
Programming with modern compiled languages such as Go, Rust, Java, or C++
Proven ability to design and scale fault-tolerant distributed systems and develop managed cloud services
Strong fundamentals in data structures, algorithms, microservices, and infrastructure tools like Docker, Kubernetes, Terraform, and CI/CD systems
Ability to work with cross-functional teams to align priorities and deliver customer-first solutions
Experience guiding engineers, improving hiring and onboarding processes, and driving team growth
Exceptional ability to articulate complex ideas and align technical solutions with customer needs
Customer-Centric Mindset
Any experience building out infrastructure tooling is a plus

Job Responsibility

Design, develop, and maintain scalable and reliable services that power our cloud platform’s user-facing experiences
Collaborate with cross-functional teams, like product and design, to evaluate tools, frameworks, and customer needs, creating innovative solutions
Design and build backend systems that underpin our cloud platform, covering everything from authentication flows to scalable, reliable access to infrastructure resources
Contribute to architectural decisions that support reliability and maintainability across the company
Mentor engineers, enhance hiring practices, and contribute to building a strong, inclusive engineering culture
Build scalable, reliable cloud services, such as user access management, Gateways, user features, and notification systems, tailored to customer needs
Partner with customer success and operations teams to create intuitive tools that enhance the end-user experience
Develop automation software that simplifies infrastructure deployment and management for seamless customer operations
Implement features that differentiate Crusoe Cloud, focusing on operational efficiency, low-touch adoption, turn-key AI services and scalability
Work closely with cloud support, engineering, and site reliability teams to align technical solutions with customer feedback and operational goals

What we offer

Restricted Stock Units
Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
Employer contributions to HSA accounts
Paid Parental Leave
Paid life insurance, short-term and long-term disability
Teladoc
401(k) with a 100% match up to 4% of salary
Generous paid time off and holiday schedule
Cell phone reimbursement
Tuition reimbursement

Fulltime

New

Senior Software Engineer, Managed Services

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’r...

Location

United States , San Francisco; Sunnyvale

Salary:

166000.00 - 201000.00 USD / Year

Crusoe

Expiration Date

Until further notice

Requirements

Cloud Expertise: Proven ability to design and scale fault-tolerant distributed systems and develop managed cloud services
Technical Proficiency: Strong fundamentals in microservices and infrastructure technologies like Docker, Kubernetes, Terraform, and CI/CD systems. Experience with observability principles and technologies, e.g., time-series databases, log aggregation, distributed tracing
Customer-Centric Mindset: A passion for creating intuitive, high-quality solutions that directly impact customer success and satisfaction
Collaboration Skills: Ability to work with cross-functional teams to align priorities and deliver customer-first solutions
Communication Skills: Exceptional ability to articulate complex ideas and align technical solutions with customer needs
Team Leadership: Mentor engineers, enhance hiring practices, and contribute to building a strong, inclusive engineering culture
Professional Experience: 3-5 years of software development experience, including programming with modern compiled languages such as Go, Rust, Java, or C++

Job Responsibility

Building Foundational Infrastructure: Build and scale core infrastructure services that manage critical resources within our cloud platform. This involves designing, developing, and deploying robust and reliable systems from the ground up
Scalable Design: Design highly scalable, durable, and reliable platform services that prioritize ease of use
Cross Functional Collaboration: Lead projects that require collaborating with engineering, cloud support, site reliability, and product teams to assess tools, frameworks, and solutions that align with both customer and operational needs
Innovation: Implement features that differentiate Crusoe Cloud, focusing on operational efficiency, low-touch adoption, turn-key AI services, and scalability

What we offer

Industry competitive pay
Restricted Stock Units in a fast growing, well-funded technology company
Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
Employer contributions to HSA accounts
Paid Parental Leave
Paid life insurance, short-term and long-term disability
Teladoc
401(k) with a 100% match up to 4% of salary
Generous paid time off and holiday schedule
Cell phone reimbursement

Fulltime

New

Vice President, Venue Technology

The Vice President, Venue Technology is a visionary and execution-focused leader...

Location

United States , Frisco

Salary:

Not provided

Legends Global

Expiration Date

Until further notice

Requirements

Bachelor’s degree in information technology, Engineering, Computer Science, or a related technical discipline required
Master’s degree or MBA strongly preferred
Minimum of 15 years of progressive experience in enterprise or venue technology leadership, with 7+ years in executive-level roles overseeing large-scale, multi-site operations
Proven success in leading technology transformation initiatives across sports, entertainment, hospitality, or other high-volume guest-facing industries
Experience managing global or national portfolios of venues, with demonstrated ability to scale technology operations and standardize platforms across diverse environments
Track record of delivering complex capital projects involving infrastructure modernization, digital innovation, and cross-functional stakeholder alignment
Deep expertise in venue technology ecosystems, including: Networking: Enterprise-grade LAN/WAN/Wi-Fi, DAS, 5G, SD-WAN (Cisco, Aruba, Extreme)
AV/Broadcast: Control rooms, IPTV, digital signage, live production systems (QSYS, Ross, Evertz)
Compute & Storage: Hybrid cloud, edge computing, virtualization (VxRail, Nutanix, VMware)
Security & Access: Physical security, surveillance, access control, Zero Trust (Genetec, Avigilon)

Job Responsibility

Develop and execute a multi-year venue technology roadmap aligned with Legends Global’s business strategy, operational priorities, and guest experience goals
Serve as the executive sponsor for venue technology innovation, advising senior leadership on emerging trends, investment opportunities, and competitive differentiation
Champion enterprise-wide initiatives such as smart venue platforms, digital twin technologies, and AI-driven operational intelligence
Oversight of venue technology budgets
Partner on new venue construction and major renovation projects
Oversee the design, deployment, and lifecycle management of mission-critical systems including: Network infrastructure (LAN/WAN/Wi-Fi, DAS, 5G)
AV and broadcast systems (IPTV, control rooms, digital signage
Compute and storage environments (hybrid cloud, edge computing)
Venue IT operations (access control, ticketing, incident management)
Establish and enforce enterprise standards for technology architecture, cybersecurity, scalability, and interoperability across all venues

What we offer

medical, dental, vision, life and disability insurance, paid vacation, and 401k plan

Fulltime

Senior Technical Program Manager – AI Infrastructure, Site Operations

Cerebras Systems

Location:
United States , Sunnyvale

Category:
IT - Administration

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
February 17, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Senior Technical Program Manager – AI Infrastructure, Site Operations

Technical Program Manager, AI Infrastructure