CrawlJobs Logo

Engineering Manager, Kernel Reliability

Cerebras Systems

Location Icon

Location:
United States; Canada , Sunnyvale

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We're looking for a deeply technical, hands-on engineering leader for our on-field Kernel Reliability team. You will lead a high performing team to tackle a critical challenge: improving the reliability of our advanced compute clusters and the underlying inference, training, and internal production services. In this role, you'll set the technical vision while staying close to the code and designing solutions that will scale to our exponentially growing system production and software service offerings.

Job Responsibility:

  • Provide hands-on technical leadership, owning the technical vision and roadmap for the kernel-centric reliability of our internal and customer-facing systems
  • Assist System and Cluster Operations teams on reducing system and service downtime after failure by providing tooling and manual intervention for failure analysis and diagnostic
  • Work with the Debug Team to enhance debug tools with the goal of speeding up failure analysis
  • Collaborate with SW teams to improve the software stack, including Kernels, to improve on-field debugging and failure analysis
  • Work with the ASIC and HW architecture teams to codesign the next generation architectures with reliability and ease of debug in mind
  • Lead, mentor, and grow a high-caliber team of engineers, fostering a culture of technical excellence and rapid execution.

Requirements:

  • 6+ years in software engineering
  • 3+ years leading teams in SW/HW reliability, debug, diagnostic, failure analysis or related fields
  • Expertise in parallel and distributed programming (message passing, multicore, GPU, embedded, etc.)
  • Expertise in debug and diagnostic tool development or expert usage (debuggers, core dump handling, code sanitizers, etc.)
  • Experience debugging distributed and parallel applications (deadlocks, livelocks, race conditions, etc.)
  • Deep understanding of computer architectures (instruction pipelining, multithreading, networking, etc.)
  • Strong background in monitoring and reliability engineering (incident response, post-mortem analysis, etc.)
  • Demonstrated ability to recruit and retain high-performing teams, mentor engineers, and partner cross-functionally to deliver customer-facing products.
What we offer:
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Simple, non-corporate work culture that respects individual beliefs.

Additional Information:

Job Posted:
February 17, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Engineering Manager, Kernel Reliability

Associate Director of Embedded Software Engineering

Silvus is seeking an Associate Director of Embedded Software Engineering to join...
Location
Location
United States , Los Angeles
Salary
Salary:
200000.00 - 250000.00 USD / Year
silvustechnologies.com Logo
Silvus Technologies (International)
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Demonstrated experience leading a team of engineers with hands-on development
  • Bachelor of Science degree in Electrical Engineering, Computer Science, or relevant engineering fields
  • 8+ years of relevant embedded system software development experience
  • Strong expertise in C programming
  • Expertise in board support package and secure boot in AMD UltraScale+ MPSoC and/or Microchip Polarfire SoC based products
  • Linux kernel driver development expertise
  • Expertise in network configurations and programming
  • Must be a U.S. Citizen due to clients under U.S. government contracts
Job Responsibility
Job Responsibility
  • Lead a team of engineers and be responsible for the team’s success on assigned projects
  • Work with the Director of Software Engineering and the rest of the engineering team to improve engineering processes, product quality, reliability, and performance
  • Develop device drivers and board support packages
  • Develop the software portion of MAC (Medium Access Control) and mobile ad-hoc networking routing protocols
  • Develop efficient wireless multicast protocols for mobile ad-hoc networking
  • Develop network management software and user interfaces
  • Develop audio streaming and push-to-talk voice applications
  • Perform system level design and implement security protocols and encryption algorithms on StreamCaster radios and other products
  • Support product security effort and regulatory compliance requirements such as NIST FIPS 140-3 and NIAP Common Criteria
  • Engage with and support customers as needed
  • Fulltime
Read More
Arrow Right

Principal Embedded Software Engineer

Silvus is seeking a full-time Principal Embedded Software Engineer to join our E...
Location
Location
United States , Irvine
Salary
Salary:
165000.00 - 215000.00 USD / Year
silvustechnologies.com Logo
Silvus Technologies (International)
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor of Science degree in Electrical Engineering, Computer Science, or relevant engineering fields
  • 8+ years of relevant embedded system software development experience
  • Expertise in C programming and experience in Linux kernel driver development
Job Responsibility
Job Responsibility
  • Implementation of the software portion of MAC (Medium Access Control) and mobile ad-hoc networking routing protocols
  • Network management software and web interface implementation
  • Implementation of different security protocols and encryption algorithms
  • Audio streaming and push-to-talk voice application implementation
  • Analyzing and improving product security and robustness to meet certain regulatory requirements such as NIST FIPS 140-3 and NIAP Common Criteria
  • Implementation of testing software for product performance and reliability testing
  • Device driver and board support package development and maintenance for both ARM and RISC-V based systems
  • Linux system customization and scripting
  • Fulltime
Read More
Arrow Right

Senior Embedded Software Engineer

Silvus is recruiting a Senior Embedded Software Engineer reporting to the Direct...
Location
Location
United States , Los Angeles
Salary
Salary:
135000.00 - 200000.00 USD / Year
silvustechnologies.com Logo
Silvus Technologies (International)
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor of Science degree in Electrical Engineering, Computer Science, or related fields
  • Minimum 5 years of relevant embedded system software development experience
  • Expertise in C programming and experience in Linux kernel driver development
  • Must be a U.S. Citizen due to clients under U.S. government contracts
  • All employment is contingent upon the successful clearance of a background check
Job Responsibility
Job Responsibility
  • Implementation of software portion of MAC (Medium Access Control) and mobile ad-hoc networking routing protocols
  • Network management software and web interface implementation
  • Implementation of different security protocols and encryption algorithms
  • Audio streaming and push to talk voice application implementation
  • Analyze and improve product security and robustness to meet certain regulatory requirements such as NIST FIPS 140-3 and NIAP Common Criteria
  • Implementation of testing software for product performance and reliability testing
  • Device driver and board support package development and maintenance for both ARM and RISC-V based systems
  • Linux system customization and scripting
  • Fulltime
Read More
Arrow Right

Senior Embedded Software Engineer

Silvus is seeking a full-time Senior Embedded Software Engineer to join our Rese...
Location
Location
United States , Los Angeles
Salary
Salary:
140000.00 - 200000.00 USD / Year
silvustechnologies.com Logo
Silvus Technologies (International)
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum Bachelor of Science degree in Electrical, Computer, or Communications Engineering, Computer Science, or relevant engineering fields
  • Minimum 5 years of relevant embedded system software development experience
  • 3 years of relevant embedded system software development experience with an advanced STEM degree
  • Expertise in C programming and experience in Linux kernel driver development
Job Responsibility
Job Responsibility
  • Implementation of software portion of MAC (Medium Access Control) and mobile ad-hoc networking routing protocols
  • Network management software and web interface implementation
  • Implementation of different security protocols and encryption algorithms
  • Audio streaming and push-to-talk voice application implementation
  • Analyze and improve product security and robustness to meet certain regulatory requirements such as NIST FIPS 140-3 and NIAP Common Criteria
  • Implementation of testing software for product performance and reliability testing
  • Device driver and board support package development and maintenance for both ARM and RISC-V based systems
  • Linux system customization and scripting
  • Fulltime
Read More
Arrow Right

Senior Systems Engineer

We are looking for a versatile and driven Senior Systems Engineer to join our En...
Location
Location
United States , Chicago
Salary
Salary:
130000.00 USD / Year
akunacapital.com Logo
AKUNA CAPITAL
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Degree in Computer Science, Information Systems, or a related field
  • 5-7 years of systems engineering experience
  • Advanced Linux knowledge including kernel bypass, kernel tuning, and customizing kernels
  • Deep understanding of virtualization and containerization technologies
  • Extensive experience with a variety of Linux distributions (RedHat, Ubuntu, etc.)
  • Deep understanding of system monitoring and configuration management tools (Ansible, Foreman, Prometheus and Icinga/Nagios)
  • Proficiency in scripting and using automation and orchestration tools such as Python and Bash
  • Expertise in troubleshooting multicast and TCP related performance issues
  • Experience automating daily software and hardware related tasks
  • Demonstrated ability to lead large technical projects
Job Responsibility
Job Responsibility
  • Analyze complex technical problems and collaborate on designing solutions for Akuna’s global Infrastructure platform
  • Drive projects and solutions to completion in a fast-paced environment
  • Design, develop and maintain orchestration and configuration solutions
  • Collaborate with developers and other infrastructure engineers to research new products and techniques that drive innovation and improve efficiency and performance in the environment
  • Architect and maintain multi-vendor, tier-based storage solutions
  • Build out a test automation framework for systems performance testing and tuning
  • Create and institute process enforcement across environments
  • Create tools that assist teams to optimize the available infrastructure
  • Develop and maintain comprehensive technical documentation, including system configurations, procedures, and troubleshooting guides
  • Lead knowledge transfer sessions and mentor team members to ensure continuity and operational excellence
What we offer
What we offer
  • Discretionary performance bonus
  • Comprehensive benefits package that may encompass employer-paid medical, dental, vision, retirement contributions, paid time off, and other benefits
  • Fulltime
Read More
Arrow Right

Software Engineer Staff

This role involves designing, developing, troubleshooting, and debugging softwar...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience in networking infrastructure or systems software development
  • Advanced programming skills in C and C++
  • Strong system-level debugging proficiency
  • Deep understanding of thread and process synchronization, IPC mechanisms
  • Proven experience in inter-module and inter-process communication design and implementation
  • Strong foundation in memory management and kernel interactions
  • Hands-on experience with cross-compilation and toolchains for multiple target platforms
  • Familiarity with networking protocols and standards including TCP/IP, BGP, OSPF, MPLS, VXLAN, etc.
  • Bachelor’s or Master’s degree in Computer Science or a related technical field
Job Responsibility
Job Responsibility
  • Define detailed software specifications based on product requirements
  • Architect, design and implement high-performance, scalable features
  • Design and implement robust inter-module communication mechanisms
  • Debug and resolve complex issues related to memory leaks, race conditions, deadlocks, dependency conflicts, and performance bottlenecks
  • Ensure smooth cross-compilation and portability across embedded, cloud-native, and target-specific environments
  • Collaborate with multi-disciplinary teams across global development centers
  • Lead design/code reviews, define technical standards, and mentor junior engineers
  • Continuously improve system observability, reliability, and maintainability
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Linux Red Hat Expert

Inetum is hiring a Linux Red Hat Expert to provide advanced support, automation,...
Location
Location
Portugal , Lisbon
Salary
Salary:
Not provided
https://www.inetum.com Logo
Inetum
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum of 5 years of experience in Red Hat Linux administration
  • Bachelor’s degree in Computer Science or equivalent
  • Expert-level proficiency in RHEL system administration, both virtual and physical
  • Hands-on experience in patch and lifecycle management using Red Hat Satellite
  • Strong skills in access control, SUDO, Kerberos, LDAP, and SSSD
  • Proficient in automation and scripting with Ansible, Shell, and Python
  • Proven ability to resolve complex issues and lead root cause investigations
  • Strong communication skills to work effectively with cross-functional teams
  • Fluent in English (B2–C1 level)
Job Responsibility
Job Responsibility
  • Provide L3 support and resolve complex incidents in Red Hat Enterprise Linux (RHEL) across VMware, bare-metal, and IBM Power (LE) systems
  • Administer and troubleshoot RHEL platforms, ensuring performance, stability, and reliability
  • Manage Red Hat Satellite Server for patching, provisioning, and subscription lifecycle
  • Operate and secure Red Hat Identity Management (IdM) for centralized authentication and host-based access control
  • Conduct performance tuning, log analysis, and implement system optimizations
  • Maintain and engineer SUDO authorizations and access control policies
  • Perform root cause analysis and implement corrective and preventive actions
  • Collaborate with security teams to integrate RHEL systems with Active Directory and IdM
  • Build and maintain Ansible playbooks for automated patching, provisioning, and configuration management
  • Plan and execute kernel upgrades, OS updates, and manage package lifecycles
  • Fulltime
Read More
Arrow Right
New

Principal Software Engineer – CXI Drivers & Kernel Networking

Principal Software Engineer – CXI Drivers & Kernel Networking. This role is part...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of systems software experience with deep expertise in Linux kernel development
  • Strong experience with: PCIe, DMA, interrupts, memory management
  • Strong experience with: Linux networking stack (netdev, IP, sockets, RDMA/RXE)
  • Hands-on experience with Switch or NIC Software Stacks, especially in the low-level kernel and user space
  • Proven ability to debug complex kernel + hardware interactions
  • Excellent C programming and kernel debugging skills
Job Responsibility
Job Responsibility
  • Architect, develop, and maintain Linux kernel drivers for the CXI interconnect, including: CXI Core Driver (shared hardware abstraction and resource management), CXI User Driver (user-space access, queue management, protection domains), CXI Ethernet Driver (IP, RXE, sockets integration)
  • Lead 800G CXI driver development: resource partitioning, Interaction with IOMMU, PCIe, and virtualization stacks
  • Own kernel interfaces used by: Lustre/LNet (kCXI, kfabric provider), Verbs / RXE paths, User-space libraries (libcxi, libfabric providers)
  • Drive performance, scalability, and reliability improvements: Low-latency paths, queueing models, retry/timeout handling, Error reporting, recovery, and fault isolation
  • Collaborate closely with ASIC, firmware, and validation teams to deliver Chip-to-Ship outcomes
  • Act as a technical leader: Design reviews, code reviews, mentoring senior engineers, Influence long-term driver architecture and roadmap
What we offer
What we offer
  • Health & Wellbeing: comprehensive suite of benefits that supports physical, financial and emotional wellbeing
  • Personal & Professional Development: specific programs catered to helping you reach any career goals
  • Unconditional Inclusion: unconditionally inclusive in the way we work and celebrate individual uniqueness
Read More
Arrow Right