CrawlJobs Logo

Engineering Manager, GPU Kernel

wayve.ai Logo

Wayve

Location Icon

Location:
United Kingdom , London

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

As the Engineering Manager for the GPU Kernel team, you’ll lead the team responsible for writing custom kernels and libraries which enable our transformer-based driving models to run efficiently on embedded GPUs and accelerators. This team works closely with ML engineers, software engineers and researchers to deploy end-to-end AI for autonomous vehicles at scale. This is an exciting opportunity to lead in several high impact, early stage projects at Wayve with the ultimate goal of enabling product deployments onto millions of customer vehicles around the world.

Job Responsibility:

  • Lead a multi-disciplinary team of ML GPU kernel engineers to enable efficient ML deployments across millions of customer vehicles
  • Set key foundational strategy on deployment frameworks, compilers, toolchains and SoCs
  • Set clear objectives and priorities, and allocate resource efficiently
  • Have opportunities to develop new skills, especially within end-to-end ML and inference optimisation

Requirements:

  • Proven experience as an Engineering Manager delivering complex engineering projects
  • Experience developing GPU kernels and/or ML compilers (e.g. CUDA, OpenCL, TensorRT, MLIR, TVM, etc)
  • Experience optimising systems to meet strict utilisation and latency requirements
  • Excellent interpersonal and communication skills

Nice to have:

  • Experience with C++ and ML frameworks such as PyTorch
  • Experience with ML deployment pipelines
  • Experience with embedded SoCs used in automotive environments, e.g. Nvidia, Qualcomm, Renesas, etc

Additional Information:

Job Posted:
January 01, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Engineering Manager, GPU Kernel

New

Sr. Software Development Engineer

As a core member of the team, you will play a pivotal role in optimizing and dev...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Skilled engineer with strong technical and analytical expertise in C++ development within Linux environments
  • Ability to define goals, manage development efforts, and deliver high-quality solutions
  • Strong problem-solving skills
  • Proactive approach
  • Keen understanding of software engineering best practices
  • Experience in GPU kernel development & optimization for AMD GPUs using HIP, CUDA, and assembly (ASM)
  • Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming
  • Experience leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance
  • Experience in integrating optimized GPU performance into machine learning frameworks (e.g., TensorFlow, PyTorch)
  • Skilled in Python and C++
Job Responsibility
Job Responsibility
  • Optimize Deep Learning Frameworks: Enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories
  • Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations
  • Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance
  • Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs
  • Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream
  • Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems
  • Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance
  • Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers
  • Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions
Read More
Arrow Right

Engineering Manager, Kernel Reliability

We're looking for a deeply technical, hands-on engineering leader for our on-fie...
Location
Location
United States; Canada , Sunnyvale; Toronto
Salary
Salary:
Not provided
cerebras.net Logo
Cerebras Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years in software engineering
  • 3+ years leading teams in SW/HW reliability, debug, diagnostic, failure analysis or related fields
  • Expertise in parallel and distributed programming (message passing, multicore, GPU, embedded, etc.)
  • Expertise in debug and diagnostic tool development or expert usage (debuggers, core dump handling, code sanitizers, etc.)
  • Experience debugging distributed and parallel applications (deadlocks, livelocks, race conditions, etc.)
  • Deep understanding of computer architectures (instruction pipelining, multithreading, networking, etc.)
  • Strong background in monitoring and reliability engineering (incident response, post-mortem analysis, etc.)
  • Demonstrated ability to recruit and retain high-performing teams, mentor engineers, and partner cross-functionally to deliver customer-facing products.
Job Responsibility
Job Responsibility
  • Provide hands-on technical leadership, owning the technical vision and roadmap for the kernel-centric reliability of our internal and customer-facing systems
  • Assist System and Cluster Operations teams on reducing system and service downtime after failure by providing tooling and manual intervention for failure analysis and diagnostic
  • Work with the Debug Team to enhance debug tools with the goal of speeding up failure analysis
  • Collaborate with SW teams to improve the software stack, including Kernels, to improve on-field debugging and failure analysis
  • Work with the ASIC and HW architecture teams to codesign the next generation architectures with reliability and ease of debug in mind
  • Lead, mentor, and grow a high-caliber team of engineers, fostering a culture of technical excellence and rapid execution.
What we offer
What we offer
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Simple, non-corporate work culture that respects individual beliefs.
Read More
Arrow Right

Senior Machine Learning Engineer

As a Machine Learning Engineer at Dedrone, you’ll play a pivotal role in advanci...
Location
Location
United States , Sterling
Salary
Salary:
Not provided
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional experience in modern C++ (C++14/17 or later), with strong object-oriented and generic programming skills
  • Deep understanding of multithreading and concurrency (threads, thread pools, locks, lock-free structures, atomics, futures, async patterns) and experience building robust, concurrent systems
  • Hands-on experience with parallel processing frameworks or patterns (SIMD, task-based parallelism, GPU offload, or similar) for real-time or high-throughput applications
  • Strong command of data structures and algorithms, and the ability to choose and implement the right structures for performance-critical, memory-constrained environments
  • Proven experience with memory management and performance optimization in C++ (stack vs heap, custom allocators, cache-aware design, avoiding fragmentation, RAII, move semantics)
  • Practical experience with CUDA (or similar GPU programming frameworks): writing kernels, managing GPU memory, optimizing for occupancy and bandwidth, and integrating with C++ codebases
  • Familiarity with Linux-based development (build systems like CMake, unit testing frameworks, containerization and/or cross-compilation for edge devices)
  • Strong debugging and profiling skills across CPU and GPU, and a methodical approach to benchmarking and regression testing
  • Excellent collaboration and communication skills, with a track record of working closely with research or ML teams to move algorithms from prototype to production
Job Responsibility
Job Responsibility
  • Design and implement high-performance C++ software that runs computer vision and tracking algorithms in real time on edge devices
  • Work closely with computer vision / self-supervised learning engineers to integrate their models into production pipelines, including pre/post-processing, I/O, and system orchestration
  • Build and optimize multithreaded and parallel processing pipelines for ingesting, synchronizing, and processing data from a networked system of cameras
  • Implement and tune CUDA kernels and GPU-accelerated components to maximize throughput and minimize latency for inference, tracking, and search
  • Design robust data structures and memory management strategies for handling large volumes of video, sensor, and metadata streams under tight compute and power constraints
  • Profile and optimize code using tools such as perf, valgrind, nvprof / Nsight, and similar to identify bottlenecks and improve CPU/GPU utilization
  • Collaborate with simulation and CV teams to deploy and evaluate algorithms in realistic test scenarios, including fault handling and performance monitoring
  • Develop clean, well-tested, and well-documented C++ libraries and services that can be reused across products and future airspace applications
  • Contribute to system-level architecture decisions, including inter-process communication, scheduling, resource allocation, and deployment strategies on edge platforms
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary paid time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Mental Wellness support
  • Learning & Development programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right
New

Senior Software Development Engineer

We are seeking an experienced and highly technical SMTS Software Development Eng...
Location
Location
United Kingdom
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or related technical field
  • 8+ years of software engineering experience in systems software, runtime libraries, GPU programming, or compiler/runtime interfaces
  • Strong proficiency in modern C++ (C++14/C++17 or newer), templates, memory models, and low‑level systems programming
  • Deep understanding of at least one GPU computing model (HIP, CUDA, SYCL, OpenCL, OpenMP offload)
  • Hands‑on experience with runtime systems, driver interfaces, or high‑performance compute libraries
  • Strong debugging skills using tools such as gdb, sanitizers, profilers, and GPU debugging tools
  • Solid understanding of parallel programming concepts—memory hierarchy, synchronization, concurrency, thread scheduling
Job Responsibility
Job Responsibility
  • Architect, implement, and optimize features in the HIP runtime, including memory management, kernel dispatch, device abstraction, multi‑GPU coordination, and synchronization primitives
  • Contribute to the evolution of the HIP programming model and interoperability with ROCr, HSA runtime, and compiler toolchains
  • Ensure functional correctness, performance, and scalability of runtime APIs across different GPU generations
  • Conduct root‑cause analysis and systems‑level debugging across the runtime, driver, compiler, and hardware layers
  • Profile GPU applications and internal runtime components to identify bottlenecks and design performance improvements
  • Optimize HIP runtime behavior for large-scale AI, HPC, and cloud workloads
  • Work closely with compiler teams (LLVM/Clang), driver teams, GPU architecture, and systems engineers to deliver end‑to‑end GPU software solutions
  • Contribute to API specifications and collaborate with upstream open-source communities where appropriate
  • Define and drive technical strategy for correctness, reliability, and conformance of the HIP runtime
  • Support enhancements in automated testing, CI, and stress/failure scenarios in the HIP test suite
Read More
Arrow Right
New

ROCm Core SW Project Manager

We are seeking an experienced Project Manager to manage ROCm development project...
Location
Location
Canada , Markham
Salary
Salary:
139200.00 - 208800.00 CAD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of program or project management experience in software development
  • At least 3 years focused on systems software, GPU computing, or HPC/AI infrastructure
  • Demonstrated experience managing complex, multi-team technical programs involving pre-silicon validation or hardware/software co-design
  • Strong foundational knowledge of machine learning frameworks, model architectures, and performance optimization techniques
  • Deep understanding of software development lifecycle (SDLC), agile methodologies, and modern CI/CD practices
  • Excellent stakeholder management, communication, and influencing skills across engineering and executive levels
  • Bachelor’s degree in Computer Science, Electrical Engineering, or related technical field
Job Responsibility
Job Responsibility
  • Manage ROCm development projects for AMD next generation GPUs
  • Drive internal SW execution including GPU performance optimization, pre-silicon performance feature development, and GPU kernel development
  • Coordinate across software, hardware, and validation teams to deliver high-performance, reliable, and scalable ROCm software stack
  • Work together with ROCm SW team to drive pre-silicon software development and performance validation activities using SW/HW emulation platforms
  • Orchestrate hardware-software co-development efforts for new GPU ML features
  • Establish and track KPIs for new GPU feature quality, performance, and time-to-market
  • Proactively identify and mitigate project risks
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

As a Senior Software Engineer, you will lead the design, development, and valida...
Location
Location
United States , Multiple Locations
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 2+ years experience in Kernel bring-up and platform enablement
  • 1+ years experience in GPU driver development and integration
  • 2+ years experience in C / C++ kernel-space programming, Git-based source management and release branching, RPM packaging, spec file authoring, and build automation
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Job Responsibility
Job Responsibility
  • Lead kernel integration and validation for new silicon platforms, from early board bring‑up through full feature enablement
  • Architect and maintain the Maintenance OS (MOS) kernel, ensuring long‑term stability, security, and compatibility across multiple hardware generations
  • Own the end‑to‑end lifecycle of GPU drivers (NVIDIA, amdgpu, ROCm), including:Integration of out‑of‑tree (OOT) kernel drivers DKMS packaging, build, and version‑tracking, Compatibility validation against kernel and firmware baselines
  • Define and manage build and release pipelines for kernel RPMs, driver SRPMs, and signing workflows
  • Collaborate with hardware, platform, and firmware teams to validate kernel features tied to new silicon capabilities (PCIe, CXL, IOMMU, NUMA, etc.)
  • Own spec files, RPM packaging, and associated CI/CD automation for kernel and driver deliverables
  • Conduct deep‑dive debugging across the full stack — from kernel to device firmware — to resolve performance, stability, or bring‑up issues
  • Drive engagement with upstream Linux communities to upstream or align kernel changes where feasible
  • Fulltime
Read More
Arrow Right
New

Software Engineer 2 - Processing Unit for Copilot

We are seeking an expert GPU Engineer 2 to join our AI Infrastructure team. In t...
Location
Location
China , Beijing
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Architectural Mastery: Expertise in the CUDA programming model and NVIDIA GPU architectures (specifically Ampere/Hopper)
  • Deep understanding of the memory hierarchy (Shared Memory, L2 cache, Registers), warp-level primitives, occupancy optimization, and bank conflict resolution
  • Familiarity with advanced hardware features: Tensor Cores, TMA (Tensor Memory Accelerator), and asynchronous copy
  • Proven ability to navigate and modify complex, large-scale codebases (e.g., PyTorch internals, Linux kernel)
  • Experience with build and binding ecosystems: CMake, pybind11, and CI/CD for GPU workloads
  • Performance Engineering: Mastery of NVIDIA Nsight Systems/Compute
  • Ability to mathematically reason about performance using the Roofline Model, memory bandwidth utilization, and compute throughput
Job Responsibility
Job Responsibility
  • Custom Operator Development: Design and implement highly optimized GPU kernels (CUDA/Triton) for critical deep learning operations (e.g., FlashAttention, GEMM, LayerNorm) to outperform standard libraries
  • Inference Engine Architecture: Contribute to the development of our high-performance inference engine, focusing on graph optimizations, operator fusion, and dynamic memory management (e.g., KV Cache optimization)
  • Performance Optimization: Deeply analyze and profile model performance using tools like Nsight Systems/Compute. Identify bottlenecks in memory bandwidth, instruction throughput, and kernel launch overheads
  • Model Acceleration: Implement advanced acceleration techniques such as Quantization (INT8, FP8, AWQ), Kernel Fusion, and continuous batching
  • Distributed Computing: Optimize communication primitives (NCCL) to enable efficient multi-GPU and multi-node inference (Tensor Parallelism, Pipeline Parallelism)
  • Hardware Adaptation: Ensure the software stack fully utilizes modern GPU architecture features (e.g., NVIDIA Hopper/Ampere Tensor Cores, Asynchronous Copy)
  • Fulltime
Read More
Arrow Right
New

Senior Software Engineer - Processing Unit for Copilot

We are seeking an expert Senior GPU Engineer to join our AI Infrastructure team....
Location
Location
China , Beijing
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Architectural Mastery: Expertise in the CUDA programming model and NVIDIA GPU architectures (specifically Ampere/Hopper)
  • Deep understanding of the memory hierarchy (Shared Memory, L2 cache, Registers), warp-level primitives, occupancy optimization, and bank conflict resolution
  • Familiarity with advanced hardware features: Tensor Cores, TMA (Tensor Memory Accelerator), and asynchronous copy
  • Proven ability to navigate and modify complex, large-scale codebases (e.g., PyTorch internals, Linux kernel)
  • Experience with build and binding ecosystems: CMake, pybind11, and CI/CD for GPU workloads
  • Performance Engineering: Mastery of NVIDIA Nsight Systems/Compute
  • Ability to mathematically reason about performance using the Roofline Model, memory bandwidth utilization, and compute throughput
Job Responsibility
Job Responsibility
  • Custom Operator Development: Design and implement highly optimized GPU kernels (CUDA/Triton) for critical deep learning operations (e.g., FlashAttention, GEMM, LayerNorm) to outperform standard libraries
  • Inference Engine Architecture: Contribute to the development of our high-performance inference engine, focusing on graph optimizations, operator fusion, and dynamic memory management (e.g., KV Cache optimization)
  • Performance Optimization: Deeply analyze and profile model performance using tools like Nsight Systems/Compute. Identify bottlenecks in memory bandwidth, instruction throughput, and kernel launch overheads
  • Model Acceleration: Implement advanced acceleration techniques such as Quantization (INT8, FP8, AWQ), Kernel Fusion, and continuous batching
  • Distributed Computing: Optimize communication primitives (NCCL) to enable efficient multi-GPU and multi-node inference (Tensor Parallelism, Pipeline Parallelism)
  • Hardware Adaptation: Ensure the software stack fully utilizes modern GPU architecture features (e.g., NVIDIA Hopper/Ampere Tensor Cores, Asynchronous Copy)
  • Fulltime
Read More
Arrow Right