CrawlJobs Logo

Member of Technical Staff, Performance Optimization

fireworks.ai Logo

Fireworks AI

Location Icon

Location:
United States, San Mateo

Category Icon
Category:
IT - Software Development

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

175000.00 - 220000.00 USD / Year

Job Description:

We're looking for a Software Engineer focused on Performance Optimization to help push the boundaries of speed and efficiency across our AI infrastructure. In this role, you'll take ownership of optimizing performance at every layer of the stack—from low-level GPU kernels to large-scale distributed systems. A key focus will be maximizing the performance of our most demanding workloads, including large language models (LLMs), vision-language models (VLMs), and next-generation video models. You’ll work closely with teams across research, infrastructure, and systems to identify performance bottlenecks, implement cutting-edge optimizations, and scale our AI systems to meet the demands of real-world production use cases. Your work will directly impact the speed, scalability, and cost-effectiveness of some of the most advanced generative AI models in the world.

Job Responsibility:

  • Optimize system and GPU performance for high-throughput AI workloads across training and inference
  • Analyze and improve latency, throughput, memory usage, and compute efficiency
  • Profile system performance to detect and resolve GPU- and kernel-level bottlenecks
  • Implement low-level optimizations using CUDA, Triton, and other performance tooling
  • Drive improvements in execution speed and resource utilization for large-scale model workloads (LLMs, VLMs, and video models)
  • Collaborate with ML researchers to co-design and tune model architectures for hardware efficiency
  • Improve support for mixed precision, quantization, and model graph optimization
  • Build and maintain performance benchmarking and monitoring infrastructure
  • Scale inference and training systems across multi-GPU, multi-node environments
  • Evaluate and integrate optimizations for emerging hardware accelerators and specialized runtimes

Requirements:

  • Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience
  • 5+ years of experience working on performance optimization or high-performance computing systems
  • Proficiency in CUDA or ROCm and experience with GPU profiling tools (e.g., Nsight, nvprof, CUPTI)
  • Familiarity with PyTorch and performance-critical model execution
  • Experience with distributed system debugging and optimization in multi-GPU environments
  • Deep understanding of GPU architecture, parallel programming models, and compute kernels

Nice to have:

  • Master’s or PhD in Computer Science, Electrical Engineering, or a related field
  • Experience optimizing large models for training and inference (LLMs, VLMs, or video models)
  • Knowledge of compiler stacks or ML compilers (e.g., torch.compile, Triton, XLA)
  • Contributions to open-source ML or HPC infrastructure
  • Familiarity with cloud-scale AI infrastructure and orchestration tools (e.g., Kubernetes, Ray)
  • Background in ML systems engineering or hardware-aware model design
What we offer:
  • Meaningful equity in a fast-growing startup
  • Competitive salary
  • Comprehensive benefits package

Additional Information:

Job Posted:
December 08, 2025

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.