Software Engineer - Cloud FinOps & Reliability Job at Luma AI (Palo Alto)

Software Engineer - Cloud FinOps & Reliability

Luma AI

Location:
United States , Palo Alto

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

120000.00 - 255000.00 USD / Year

Save Job

Apply Position

Job Description:

This is a foundational engineering position for a technical, data-driven expert who gets excited about optimization at a massive scale. As a foundational member of our SRE team, you will specialize in FinOps and cloud cost management, owning the financial health of one of the world's largest multi-cloud GPU infrastructures. You will be an SRE who applies a deep understanding of cloud architecture and pricing models to find and eliminate inefficiency. You will use your software engineering skills to build the tools and automation required to govern our cloud spend, providing critical insights that allow us to scale our AI research and products sustainably.

Job Responsibility:

Analyze & Optimize: Actively monitor and analyze costs across our entire technical ecosystem—including multi-cloud infrastructure (AWS, GCP, OCI), on-premise clusters, and third-party services—to identify and execute on opportunities for cost optimization. Develop forecasting models to predict future spend and inform our capacity planning
Manage & Commit: Develop and actively manage a multi-million dollar portfolio of Reserved Instances (RIs) and Savings Plans to maximize commitment-based discounts across our global GPU and CPU fleets
Automate & Build: Apply a software engineering approach to design, build, and maintain custom tools and automation in Python and SQL. Your systems will track, analyze, and report on costs across our entire fleet of providers and services, with a focus on detecting anomalies immediately
Partner & Advise: Working closely as an embedded member of the SRE team, you will partner with fellow SREs and research teams to model the cost implications of new models and infrastructure designs, providing expert guidance on cost-performance trade-offs
Visualize & Report: Create and manage a centralized observability stack for cloud costs, building dashboards in tools like Grafana to give a real-time, granular view of our financial posture to all stakeholders

Requirements:

5+ years of experience in a technical role such as Site Reliability Engineer, DevOps Engineer, Infrastructure Engineer, or a dedicated Cloud Cost Engineer
Deep, hands-on expertise with the cost models and optimization levers of at least one major cloud provider (AWS, GCP), and a willingness to learn others
Proficient in Python for the purpose of scripting, data analysis, and building automation tooling
Strong, foundational understanding of cloud infrastructure, including containerization (Docker, Kubernetes), networking, and storage
Not an accountant
you are a systems thinker who is passionate about applying engineering principles to solve financial challenges at scale
A tenacious troubleshooter and a data-driven decision-maker who thrives on finding the 'why' behind the numbers

Nice to have:

Experience managing a monthly cloud spend in excess of $1 million
Relevant certifications, such as the FinOps Certified Practitioner (FOCP)
Experience building custom cost allocation, showback, or chargeback systems from scratch
A background working with large-scale GPU clusters for AI/ML workloads

Additional Information:

Job Posted:
January 13, 2026

Employment Type:

Fulltime

Work Type:

Hybrid work

Luma AI - All Job Offers

Job Link Share:

Software Engineer - Cloud FinOps & Reliability

Luma AI

Location:
United States , Palo Alto

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
January 13, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Software Engineer - Cloud FinOps & Reliability

Cloud Engineering Manager - FinOps

Staff Platform Software Engineer

Executive Director, Digital SRE & Operations

Distinguished Engineer

Senior Infrastructure Software Engineer, Cloud Foundation

Senior Software Engineer, Capacity & Efficiency

Principal Azure DevOps Engineer

Senior Azure DevOps Engineer

Software Engineer - Cloud FinOps & Reliability

Luma AI

Location:United States , Palo Alto

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:January 13, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Software Engineer - Cloud FinOps & Reliability

Cloud Engineering Manager - FinOps

Staff Platform Software Engineer

Executive Director, Digital SRE & Operations

Distinguished Engineer

Senior Infrastructure Software Engineer, Cloud Foundation

Senior Software Engineer, Capacity & Efficiency

Principal Azure DevOps Engineer

Senior Azure DevOps Engineer

Location:
United States , Palo Alto

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
January 13, 2026