This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Join Cerebras as a Performance & Reliability Engineer within our innovative Co-Design and Next Generation Team. Our groundbreaking CS-3 system has set new benchmarks in high-performance ML training and inference solutions. It leverages a dinner-plate sized chip with 44GB of on-chip memory to surpass traditional hardware capabilities. This role focuses on characterizing and optimizing the performance and reliability of state-of-the-art AI models running on Cerebras' breakthrough hardware.
Job Responsibility:
Characterize and enhance the performance and reliability of advanced ML hardware/software systems, with emphasis on reducing power and thermal fluctuations
Analyze ML workloads, software kernels, and hardware architecture for power and performance impacts, and synthesize high-level insights across these layers
Develop creative software solutions to improve reliability and performance, collaborating cross-functionally to deploy these solutions in production
Influence the design of Cerebras' next-generation AI architecture and software stack through rigorous workload analysis and computational efficiency optimization
Partner with ML engineers, researchers, and reliability specialists to understand model behavior and drive system-level improvements from a software perspective
Collaborate with teams in architecture, silicon, and research to advance our computational platforms and influence future system designs
Requirements:
BS, MS, or PhD in Computer Science, Electrical Engineering, or a related field
3+ years of relevant experience in performance engineering, reliability, computer architecture, and/or software design
Proficiency in Python or other scripting languages
Experience with C/C++ and assembly programming
Demonstrated expertise with system-level performance and reliability optimization
Strong verbal and written communication skills
Nice to have:
Hands-on experience with ML models, ML frameworks, and collective communication
Understanding of thermal management principles and power delivery for advanced semiconductors
What we offer:
Build a breakthrough AI platform beyond the constraints of the GPU
Publish and open source their cutting-edge AI research
Work on one of the fastest AI supercomputers in the world
Enjoy job stability with startup vitality
Our simple, non-corporate work culture that respects individual beliefs