CrawlJobs Logo

Kernel Optimization Engineer

Cerebras Systems

Location Icon

Location:
United Arab Emirates , Dubai

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

Job Responsibility:

  • Develop design specifications for new machine learning and linear algebra kernels and mapping to the Cerebras WSE System using various parallel programming algorithms
  • Develop and debug kernel library of highly optimized low level assembly instruction and C-like domain specific language routines to implement algorithms targeting the Cerebras hardware system
  • Develop and debug high-performance kernel routines in low-level assembly and a custom C-like (CSL) language, implementing algorithms optimized for the Cerebras hardware system
  • Using mathematical models and analysis to measure the software performance and inform design decisions
  • Develop and integrate unit and system testing methodologies to verify correct functionality and performance of kernel libraries
  • Study emerging trends in Machine Learning applications and help evolve Kernel library architecture to address computational challenges of the start-of-the-art Neural Networks
  • Interact with chip and system architects to optimize instruction sets, microarchitecture, and IO of next generation systems

Requirements:

  • Bachelor’s, Master’s, PhD or foreign equivalents in Computer Science, Computer Engineering, Mathematics, or related fields
  • Understanding of hardware architecture concepts — must be comfortable learning the details of a new hardware architecture
  • Skilled in C++ and Python programming languages
  • Good knowledge of library and/or API development best practices
  • Strong debugging skills and knowledge of debugging complex software stack

Nice to have:

  • Experience in kernel development and/or testing
  • Familiarity with parallel algorithms and distributed memory systems
  • Experience in programming accelerators such as GPUs and FPGAs
  • Familiarity with Machine Learning neural networks and frameworks such as TensorFlow and PyTorch
  • Familiarity with HPC kernels and their optimization
What we offer:
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs

Additional Information:

Job Posted:
February 17, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Kernel Optimization Engineer

Member of Technical Staff, Performance Optimization

We're looking for a Software Engineer focused on Performance Optimization to hel...
Location
Location
United States , San Mateo
Salary
Salary:
175000.00 - 220000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience
  • 5+ years of experience working on performance optimization or high-performance computing systems
  • Proficiency in CUDA or ROCm and experience with GPU profiling tools (e.g., Nsight, nvprof, CUPTI)
  • Familiarity with PyTorch and performance-critical model execution
  • Experience with distributed system debugging and optimization in multi-GPU environments
  • Deep understanding of GPU architecture, parallel programming models, and compute kernels
Job Responsibility
Job Responsibility
  • Optimize system and GPU performance for high-throughput AI workloads across training and inference
  • Analyze and improve latency, throughput, memory usage, and compute efficiency
  • Profile system performance to detect and resolve GPU- and kernel-level bottlenecks
  • Implement low-level optimizations using CUDA, Triton, and other performance tooling
  • Drive improvements in execution speed and resource utilization for large-scale model workloads (LLMs, VLMs, and video models)
  • Collaborate with ML researchers to co-design and tune model architectures for hardware efficiency
  • Improve support for mixed precision, quantization, and model graph optimization
  • Build and maintain performance benchmarking and monitoring infrastructure
  • Scale inference and training systems across multi-GPU, multi-node environments
  • Evaluate and integrate optimizations for emerging hardware accelerators and specialized runtimes
What we offer
What we offer
  • Meaningful equity in a fast-growing startup
  • Competitive salary
  • Comprehensive benefits package
  • Fulltime
Read More
Arrow Right

Senior Systems Engineer

We are looking for a versatile and driven Senior Systems Engineer to join our En...
Location
Location
United States , Chicago
Salary
Salary:
130000.00 USD / Year
akunacapital.com Logo
AKUNA CAPITAL
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Degree in Computer Science, Information Systems, or a related field
  • 5-7 years of systems engineering experience
  • Advanced Linux knowledge including kernel bypass, kernel tuning, and customizing kernels
  • Deep understanding of virtualization and containerization technologies
  • Extensive experience with a variety of Linux distributions (RedHat, Ubuntu, etc.)
  • Deep understanding of system monitoring and configuration management tools (Ansible, Foreman, Prometheus and Icinga/Nagios)
  • Proficiency in scripting and using automation and orchestration tools such as Python and Bash
  • Expertise in troubleshooting multicast and TCP related performance issues
  • Experience automating daily software and hardware related tasks
  • Demonstrated ability to lead large technical projects
Job Responsibility
Job Responsibility
  • Analyze complex technical problems and collaborate on designing solutions for Akuna’s global Infrastructure platform
  • Drive projects and solutions to completion in a fast-paced environment
  • Design, develop and maintain orchestration and configuration solutions
  • Collaborate with developers and other infrastructure engineers to research new products and techniques that drive innovation and improve efficiency and performance in the environment
  • Architect and maintain multi-vendor, tier-based storage solutions
  • Build out a test automation framework for systems performance testing and tuning
  • Create and institute process enforcement across environments
  • Create tools that assist teams to optimize the available infrastructure
  • Develop and maintain comprehensive technical documentation, including system configurations, procedures, and troubleshooting guides
  • Lead knowledge transfer sessions and mentor team members to ensure continuity and operational excellence
What we offer
What we offer
  • Discretionary performance bonus
  • Comprehensive benefits package that may encompass employer-paid medical, dental, vision, retirement contributions, paid time off, and other benefits
  • Fulltime
Read More
Arrow Right

Founding GPU Kernel Engineer

We're looking for a Founding GPU Kernel Engineer who lives right at the boundary...
Location
Location
United States , San Francisco
Salary
Salary:
285000.00 - 315000.00 USD / Year
workatastartup.com Logo
YC Work at a Startup
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep expertise in GPU architecture
  • Proven track record of hand-writing kernels that match or beat vendor libraries (cuBLAS, cuDNN, CUTLASS)
  • Strong skills with low-level profiling tools: Nsight Compute, Nsight Systems, rocprof, or equivalents
  • Experience reading and reasoning about PTX/SASS or GPU assembly
  • Solid systems programming in C++ and CUDA (or ROCm/HIP)
  • Good understanding of how high-level ML operations map to hardware execution
  • Experience with distributed training systems: collective ops like all-reduce and all-gather, NCCL/RCCL, multi-node communication patterns
Job Responsibility
Job Responsibility
  • Write and hand-optimize GPU kernels for ML workloads (matmuls, attention, normalization, etc.) to set the performance ceilings
  • Profile at the microarchitectural level: look into SM utilization, warp stalls, memory bank conflicts, register pressure, instruction throughput
  • Debug performance issues by digging deep into things like clock speeds, thermal throttling, driver behavior, hardware errata
  • Turn your hand-optimization insights into automated compiler passes (working closely with our compiler team)
  • Develop performance models that predict how kernels will behave across different GPU architectures
  • Build tools and methods for systematic kernel optimization
  • Work with NVIDIA, AMD, and emerging AI accelerators - understand the common parts and what's vendor-specific
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • relocation assistance
  • Fulltime
Read More
Arrow Right
New

Principal Engineer - Generative AI Infra Capabilities

Wells Fargo is seeking a Principal Engineer - Generative Gen AI GPU Infrastructu...
Location
Location
India , BENGALURU
Salary
Salary:
Not provided
https://www.wellsfargo.com/ Logo
Wells Fargo
Expiration Date
February 20, 2026
Flip Icon
Requirements
Requirements
  • 7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • Design GPU cluster topologies (H100/H200, NVLink/NVSwitch), networking, and storage paths for high‑throughput inferencing
  • document sizing and perf baselines.
  • Implement Run: AI constructs (Collections/Departments/Projects/workloads) for MDEV/MDEP/UCEP/MRM
  • codify quota, priority, and fair‑share policies.
  • POC & benchmark disaggregated inferencing (prefill/decode) with vLLM/TensorRT‑LLM
  • publish guidance for H100/H200 tuning (FP8/INT8/AWQ) and KV‑transfer behavior over NVLink.
  • Operationalize OpenShift AI parity for GPU scheduling, time slicing/MIG profiles, and preemption
  • validate upgrade paths and helm/kustomize packaging.
  • Integrate Triton Inference Server for multi‑model serving
Job Responsibility
Job Responsibility
  • Act as an advisor to leadership to develop or influence applications, network, information security, database, operating systems, or web technologies for highly complex business and technical needs across multiple groups
  • Lead the strategy and resolution of highly complex and unique challenges requiring in-depth evaluation across multiple areas or the enterprise, delivering solutions that are long-term, large-scale and require vision, creativity, innovation, advanced analytical and inductive thinking
  • Translate advanced technology experience, an in-depth knowledge of the organizations tactical and strategic business objectives, the enterprise technological environment, the organization structure, and strategic technological opportunities and requirements into technical engineering solutions
  • Provide vision, direction and expertise to leadership on implementing innovative and significant business solutions
  • Maintain knowledge of industry best practices and new technologies and recommends innovations that enhance operations or provide a competitive advantage to the organization
  • Strategically engage with all levels of professionals and managers across the enterprise and serve as an expert advisor to leadership
  • Fulltime
!
Read More
Arrow Right

Senior Software Engineer

At Cloudera, we empower people to transform complex data into clear and actionab...
Location
Location
United States , Austin
Salary
Salary:
Not provided
cloudera.com Logo
Cloudera
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS/BA, MS in Computer Science
  • 5+ years of experience
  • Strong foundation in systems software, algorithms and data structures
  • Good implementation skills (C/C++)
  • Experience with UNIX Kernel concepts such as memory management, I/O access paradigms, file system internals, and knowledge of user space API
  • Strong debugging skills in kernel context
  • Familiarity with additional Tools like gdb, crash, modprobe
  • Strong written and verbal communication skills
Job Responsibility
Job Responsibility
  • Design and develop software products
  • Maintain and improve the performance of existing software
  • Clearly and regularly communicate with management and technical support colleagues
  • Test and maintain software products to ensure strong functionality and optimization
What we offer
What we offer
  • Generous PTO Policy
  • Support work life balance with Unplugged Days
  • Flexible WFH Policy
  • Mental & Physical Wellness programs
  • Phone and Internet Reimbursement program
  • Access to Continued Career Development
  • Comprehensive Benefits and Competitive Packages
  • Paid Volunteer Time
  • Employee Resource Groups
  • Fulltime
Read More
Arrow Right

Research Scientist / Engineer – Performance Optimization

The Performance Optimization team at Luma is dedicated to maximizing the efficie...
Location
Location
United States , Palo Alto
Salary
Salary:
187500.00 - 395000.00 USD / Year
lumalabs.ai Logo
Luma AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expert-level proficiency in Triton/CUDA programming and GPU optimization
  • Strong PyTorch skills
  • Experience with PyTorch kernel development and custom operations
  • Proficiency with profiling tools (NVIDIA Nsight, torch profiler, custom tooling)
  • Deep understanding of transformer architectures and attention mechanisms
Job Responsibility
Job Responsibility
  • Profile and optimize GPU/CPU/Accelerator code for maximum utilization and minimal latency
  • Write high-performance PyTorch, Triton, CUDA, deferring to custom PyTorch operations if necessary
  • Develop fused kernels and leverage tensor cores and modern hardware features for optimal hardware utilization on different hardware platforms
  • Optimize model architectures and implementations for distributed multi-node production deployment
  • Build performance monitoring and analysis tools and automation
  • Research and implement cutting-edge optimization techniques for transformer model
  • Fulltime
Read More
Arrow Right

Software Engineer (Technical Leadership) - Kernel

At Meta, we're building and operating one of the world's most dynamic and fast-p...
Location
Location
United States , Menlo Park
Salary
Salary:
219000.00 - 301000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 10+ years software development experience in industry settings or PhD with 4+ years of experience
  • 3+ years relevant experience with Linux kernel, firmware, or other low level systems programming
  • Proficiency in C/C++ and at least one scripting language (Python/Shell Scripting)
  • Experience leading projects with industry-wide impact
  • Vast experience communicating and working across functions to drive solutions
  • Significant experience in mentoring/influencing experienced engineers across organizations
  • Proven track record of planning multi-year roadmap in which shorter-term projects ladder to the long term mission
  • Experience in driving large cross-functional/industry-wide engineering efforts
Job Responsibility
Job Responsibility
  • Design, develop, and validate Linux Kernel and userspace software
  • Debug complex system-level issues and lead performance tuning exercises to optimize software stack performance
  • Understand software components from multiple partner teams, lead integration efforts, and drive continued development
  • Develop and automate test suites for CI/CD framework and various components
  • Collaborate with partner teams to integrate software components, align on goals, and participate in oncall rotations
  • Participate in multiple open source communities through patch review, conferences, and discussions
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

Software Engineering Manager - Meta Superintelligence Labs - Infra: Optimizations Team

Meta is seeking hands-on engineering managers to join the Meta SuperIntelligence...
Location
Location
United States , Bellevue
Salary
Salary:
184000.00 - 257000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • MS or BS in Computer Science or Electrical/Electronics Engineering or equivalent
  • 3+ years of experience of directly managing or leading a team of engineers with varied skill levels
  • Experience in leading teams working on high performance computing (HPC) and AI/ML systems, including: GPU/ASIC-based kernel development and optimization (e.g. CUDA)
  • Distributed systems for large scale training and serving
  • Systems Architecture + Performance
  • Large scale distributed systems
  • Experience running a large-scale program and dealing with ambiguity
Job Responsibility
Job Responsibility
  • Lead and support the team that develops various kernels including but not limited to GEMMs, Attention mechanisms etc. Also, contribute to enabling performance at scale of our inference and training of next generation GenAI (Llama) models
  • Enable the growth of individual contributors, driving the technical roadmap along with technical leads and expand the impact of the team by growing new skill-sets and capabilities
  • Lead a high performance team of engineers to deliver new capabilities and efficient compute systems for our fleet
  • Technical management
  • Experience in systems architecture, performance, workload-analysis and large scale distributed systems
  • Work cross-functionally across hardware and software/services team to drive engineering efforts
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right