This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We're looking for a Software Engineer focused on Performance Optimization to help push the boundaries of speed and efficiency across our AI infrastructure. In this role, you'll take ownership of optimizing performance at every layer of the stack—from low-level GPU kernels to large-scale distributed systems. A key focus will be maximizing the performance of our most demanding workloads, including large language models (LLMs), vision-language models (VLMs), and next-generation video models. You’ll work closely with teams across research, infrastructure, and systems to identify performance bottlenecks, implement cutting-edge optimizations, and scale our AI systems to meet the demands of real-world production use cases. Your work will directly impact the speed, scalability, and cost-effectiveness of some of the most advanced generative AI models in the world.
Job Responsibility:
Optimize system and GPU performance for high-throughput AI workloads across training and inference
Analyze and improve latency, throughput, memory usage, and compute efficiency
Profile system performance to detect and resolve GPU- and kernel-level bottlenecks
Implement low-level optimizations using CUDA, Triton, and other performance tooling
Drive improvements in execution speed and resource utilization for large-scale model workloads (LLMs, VLMs, and video models)
Collaborate with ML researchers to co-design and tune model architectures for hardware efficiency
Improve support for mixed precision, quantization, and model graph optimization
Build and maintain performance benchmarking and monitoring infrastructure
Scale inference and training systems across multi-GPU, multi-node environments
Evaluate and integrate optimizations for emerging hardware accelerators and specialized runtimes
Requirements:
Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience
5+ years of experience working on performance optimization or high-performance computing systems
Proficiency in CUDA or ROCm and experience with GPU profiling tools (e.g., Nsight, nvprof, CUPTI)
Familiarity with PyTorch and performance-critical model execution
Experience with distributed system debugging and optimization in multi-GPU environments
Deep understanding of GPU architecture, parallel programming models, and compute kernels
Nice to have:
Master’s or PhD in Computer Science, Electrical Engineering, or a related field
Experience optimizing large models for training and inference (LLMs, VLMs, or video models)
Knowledge of compiler stacks or ML compilers (e.g., torch.compile, Triton, XLA)
Contributions to open-source ML or HPC infrastructure
Familiarity with cloud-scale AI infrastructure and orchestration tools (e.g., Kubernetes, Ray)
Background in ML systems engineering or hardware-aware model design
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.