CrawlJobs Logo

Supercomputing Engineer

etched.com Logo

Etched

Location Icon

Location:
United States , San Jose

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

200000.00 - 275000.00 USD / Year

Job Description:

Etched is building at-scale AI systems that will unlock faster, more efficient inference for billions of people, and the Supercomputing team is critical in enabling this mission. We are seeking a highly skilled and motivated Engineer to join our Supercomputing team to help build the foundational software that powers our cluster-scale AI compute deployments. This role on the core team involves the development, integration, and debugging of critical system components, including on control-plane software, system bring-up, telemetry, orchestration primitives, and performance tuning at the hardware–software boundary.

Job Responsibility:

  • Architect and implement low-level control-plane software responsible for system bring-up, configuration, and management of cluster-scale AI compute deployments
  • Build system services that interact directly with hardware, firmware, and the operating system
  • Develop telemetry, logging, and tracing infrastructure for diagnosing failures and driving performance improvements
  • Implement orchestration primitives for managing devices, nodes, and racks
  • Profile and tune performance across PCIe, memory, networking, kernel, and runtime layers
  • Collaborate closely with hardware, firmware, kernel, and runtime teams to co-design system interfaces and behavior

Requirements:

  • Strong proficiency in C/C++ or Rust for low-level systems programming
  • Deep understanding of Linux internals, kernel/user-space boundaries, and system-level debugging
  • Experience working close to hardware: drivers, DMA, interrupts, memory management, or device control paths
  • Strong debugging skills using logs, tracing, and low-level observability tools
  • Strong communication skills and comfort collaborating across hardware and software teams

Nice to have:

  • Experience with data center orchestration technologies such as Kubernetes and Docker
  • Experience with kernel development, device drivers, or firmware-adjacent software
  • Familiarity with PCIe, NUMA, networking, or high-speed interconnects
  • Experience with tracing and profiling tools such as perf, eBPF, ftrace, or custom instrumentation
  • Experience taking complex systems from early bring-up through stable operation
  • Background in HPC, AI infrastructure, or large-scale compute systems
  • Experience designing system test harnesses and failure-injection frameworks
  • Familiarity with Kubernetes or cluster orchestration at the node or control-plane level
What we offer:
  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • Housing subsidy of $2k per month for those living within walking distance of the office
  • Relocation support for those moving to San Jose (Santana Row)
  • Various wellness benefits covering fitness, mental health, and more
  • Daily lunch + dinner in our office

Additional Information:

Job Posted:
February 18, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Supercomputing Engineer

Supercomputing Software Engineer

We are seeking a highly skilled and motivated Supercomputing Software Engineer t...
Location
Location
Taiwan , Taipei
Salary
Salary:
Not provided
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in C/C++ or Python
  • Strong understanding of BIOS and BMC firmware architectures
  • Experience with server boot processes
  • Knowledge of root-of-trust and security principles
  • Strong understanding of operating systems (Linux preferred) and server hardware architectures
  • Experience with advanced system logging and diagnostic tools
  • Ability to analyze complex technical problems and provide effective solutions
  • Excellent communication and collaboration skills
  • Experience with version control systems (e.g., Git)
  • Experience with reading and interpreting hardware logs
Job Responsibility
Job Responsibility
  • Integrate and maintain BIOS and BMC firmware, ensuring robust and efficient server boot processes
  • Measure and Tune System Performance Configuration: Analyze DRAM timings, PCIe configurations, power state transitions etc. to ensure high performance and maximal reliability
  • Root of Trust and Security: Validating security features, including root of trust mechanisms, to protect system integrity and data security
  • Advanced System Logging and Diagnostics: Design and implement advanced system logging and diagnostic capabilities to facilitate efficient troubleshooting and performance analysis
  • Data Center Orchestration Integration: Integrate and optimize node-level data center orchestration technologies, such as Kubernetes and Docker, into the system software stack
  • System Validation and Testing: Develop and execute comprehensive test plans to validate system software functionality, stability, and performance
  • Collaboration and Troubleshooting: Collaborate with hardware and software teams to diagnose and resolve complex system-level issues
What we offer
What we offer
  • Competitive compensation packages including generous equity packages
  • Comprehensive insurance coverage and other top-of-market benefits
  • Fulltime
Read More
Arrow Right
New

Strategic Finance Compute Lead

Compute is a key lever for OpenAI and AI progress. We are seeking a Strategic Fi...
Location
Location
United States , San Francisco
Salary
Salary:
185000.00 - 260000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience across strategic finance, private / growth equity, investment banking, strategy & operations, and/or business development with 3+ years of finance operating experience at a high-growth technology company
  • Experience partnering with engineering and product teams to provide financial analysis and insights to critical strategic decisions
  • Good understanding of cloud technology and compute infrastructure
  • Exceptionally strong analytical, financial modeling, and written and oral communication skills
  • Demonstrated track record of thoughtful investment decisions
  • Experience driving operational outcomes under ambitious deadlines
  • Exceptionally strong relationship building, business judgment, and communication skills
  • Bachelor’s degree or equivalent practical experience
Job Responsibility
Job Responsibility
  • Own and develop financial models across different elements of compute (GPUs, CPUs, storage and networking)
  • Lead strategic financial analysis for long-term capacity initiatives, working closely with scaling and supercomputing engineering teams
  • Maintain deep expertise on compute contract terms, pricing structures and optimization opportunities
  • Serve as a partner to FP&A and strategic finance teams, aligning compute and infrastructure with broader financial and business strategies
  • Create high-quality Exec and Board-facing presentations
  • Stay abreast of market trends and competitive dynamics to inform and improve our infrastructure strategy
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Supercomputing Engineer (Test)

We are seeking highly motivated and detail-oriented Supercomputing Engineer (Tes...
Location
Location
United States , San Jose
Salary
Salary:
150000.00 - 275000.00 USD / Year
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in at least one scripting language (e.g., Python, Bash, Go)
  • Experience with software testing methodologies and tools
  • Strong understanding of operating systems (Linux preferred) and server hardware architectures
  • Ability to analyze complex technical problems and provide effective solutions
  • Excellent communication and collaboration skills
  • Ability to work independently and as part of a team
  • Experience with version control systems (e.g., Git)
  • Experience with reading and interpreting hardware logs
Job Responsibility
Job Responsibility
  • Test Development: Design, develop, and implement automated burn-in test suites using common scripting languages (Python, Go, Bash) and test frameworks across all aspects of System Operation including: boot sequences, root-of-trust, system management, workload deployment and performance
  • Test Execution: Execute burn-in tests on server hardware, monitor system performance and health, and analyze test results
  • Failure Analysis: Investigate and debug hardware and software failures identified during testing, providing detailed reports and mitigation plans
  • Collaboration: Collaborate with internal and external hardware and software engineering teams to identify root causes of failures and implement corrective actions
  • Test Infrastructure: Contribute to the development and maintenance of the burn-in testing infrastructure, including portable test environments and automation tools runable in any environment
  • Documentation: Create and maintain comprehensive documentation for test plans, test cases, and test results
  • Performance Analysis: Analyze system performance metrics to identify potential bottlenecks and areas for optimization
  • Continuous Improvement: Participate in continuous improvement efforts to enhance the efficiency and effectiveness of the burn-in testing process
What we offer
What we offer
  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • Housing subsidy of $2k per month for those living within walking distance of the office
  • Relocation support for those moving to San Jose (Santana Row)
  • Various wellness benefits covering fitness, mental health, and more
  • Daily lunch + dinner in our office
  • Fulltime
Read More
Arrow Right

Supercomputing Engineer (Network)

We are seeking highly motivated and skilled Supercomputing Engineers (Network) t...
Location
Location
United States , San Jose
Salary
Salary:
150000.00 - 275000.00 USD / Year
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in C/C++
  • Proficiency in at least one scripting language (e.g., Python, Bash, Go)
  • Strong experience with device-to-device networking technologies (RDMA, GPUDirect, etc.), including RoCE
  • Experience with zero-copy networking, RDMA verbs and memory registration
  • Familiarity with queue pairs, completions queues, and transport types
  • Strong understanding of operating systems (Linux preferred) and server hardware architectures
  • Ability to analyze complex technical problems and provide effective solutions
  • Excellent communication and collaboration skills
  • Ability to work independently and as part of a team
  • Experience with version control systems (e.g., Git)
Job Responsibility
Job Responsibility
  • Design, develop, and implement RDMA based networking peering, supporting high bandwidth, low latency communication across PCIe nodes within and across racks
  • Develop tests that qualify host processors (x86), NICs, TORs and device network interfaces for high performance
  • Furnish burn-in teams with tests that represent both real-world use cases and workloads for device to device networking, and extreme-load stress testing
  • Define the key metrics that system software must collect to maintain high availability and performance under extreme communications workloads
What we offer
What we offer
  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • Housing subsidy of $2k per month for those living within walking distance of the office
  • Relocation support for those moving to San Jose (Santana Row)
  • Various wellness benefits covering fitness, mental health, and more
  • Daily lunch + dinner in our office
  • Fulltime
Read More
Arrow Right

Supercomputing Test Software Engineer

We are seeking highly motivated and detail-oriented Software Engineers to join o...
Location
Location
Taiwan , Taipei
Salary
Salary:
Not provided
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in at least one scripting language (e.g., Python, Bash, Go)
  • Experience with software testing methodologies and tools
  • Strong understanding of operating systems (Linux preferred) and server hardware architectures
  • Ability to analyze complex technical problems and provide effective solutions
  • Excellent communication and collaboration skills
  • Ability to work independently and as part of a team
  • Experience with version control systems (e.g., Git)
  • Experience with reading and interpreting hardware logs
Job Responsibility
Job Responsibility
  • Design, develop, and implement automated supercomputing test suites using common scripting languages (Python, Go, Bash) and test frameworks across all aspects of System Operation including: boot sequences, root-of-trust, system management, workload deployment and performance
  • Execute tests on server hardware, monitor system performance and health, and analyze test results
  • Investigate and debug hardware and software failures identified during testing, providing detailed reports and mitigation plans
  • Collaborate with internal and external hardware and software engineering teams to identify root causes of failures and implement corrective actions
  • Contribute to the development and maintenance of the supercomputing testing infrastructure, including portable test environments and automation tools runnable in any environment
  • Create and maintain comprehensive documentation for test plans, test cases, and test results
  • Analyze system performance metrics to identify potential bottlenecks and areas for optimization
  • Participate in continuous improvement efforts to enhance the efficiency and effectiveness of the testing process
What we offer
What we offer
  • Competitive compensation packages including generous equity packages
  • Comprehensive insurance coverage and other top-of-market benefits
  • Fulltime
Read More
Arrow Right

Talent Sourcer

As we scale, we’re looking for a Talent Sourcer (Supercomputing/ML) to build and...
Location
Location
United States , San Jose
Salary
Salary:
100000.00 - 220000.00 USD / Year
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience sourcing technical talent in highly competitive markets
  • deep experience sourcing software, systems, infrastructure, or hardware engineers
  • highly resourceful and love finding exceptional candidates beyond obvious platforms
  • thrive in ambiguity and enjoy building sourcing engines from scratch
  • detail-oriented, organized, and operationally strong
  • care deeply about candidate experience and employer brand
  • love working in high-velocity environments with extremely high hiring bars
Job Responsibility
Job Responsibility
  • Own top-of-funnel sourcing strategy across priority engineering roles in supercomputing, ML systems, firmware, networking, and distributed systems
  • build and maintain high-quality talent pipelines through outbound sourcing, referrals, events, research, and creative outreach
  • partner closely with recruiters and hiring managers to deeply understand role requirements, ideal profiles, and search strategy
  • develop market maps for niche technical domains and continuously expand our talent network
  • run high-volume, high-signal outbound campaigns with thoughtful personalization
  • track sourcing performance, conversion rates, and funnel health
  • continuously experiment with new sourcing channels, tools, and techniques
  • deliver a best-in-class candidate experience from first touch onward
What we offer
What we offer
  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • housing subsidy of $2k per month for those living within walking distance of the office
  • relocation support for those moving to San Jose (Santana Row)
  • various wellness benefits covering fitness, mental health, and more
  • daily lunch + dinner in our office
  • Fulltime
Read More
Arrow Right
New

Software Engineer, Frontier Systems - Power Management

As a Software Engineer on the Frontier Systems team focused on power management,...
Location
Location
United States , San Francisco
Salary
Salary:
295000.00 - 445000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of software engineering experience with a focus on solving large-scale, system-level challenges
  • Strong proficiency in Python and familiarity with automation and scripting tools (e.g., shell scripting)
  • Experience with distributed systems to efficiently aggregate and analyze streaming data
  • Knowledge of electrical engineering concepts including digital signal processing, power systems, Fast Fourier Transforms, or related areas
  • Experience in system-level investigations and development of automated solutions to address power management, fault detection, and remediation
  • Strong analytical skills and the ability to dig into noisy data (experience with SQL, PromQL, Pandas, etc.)
  • Comfort working with both hardware and software teams to solve multidisciplinary problems
Job Responsibility
Job Responsibility
  • Develop and implement system-level and software-level solutions to optimize power usage in large-scale supercomputers, ensuring efficient and reliable operations
  • Build automation to monitor power consumption patterns during training workloads and design algorithms to stabilize these fluctuations, preventing issues with grid reliability
  • Work with researchers and engineers to design tools for real-time monitoring, detection, and remediation of power-related hardware and system faults
  • Collaborate cross-functionally to translate complex electrical system requirements into code, while driving continuous improvements in power management solutions
  • Drive the development of power throttling mechanisms at the IT system level to dynamically adjust power usage based on workload demands and infrastructure limitations
  • Collaborate with hardware design teams to integrate system-level power control requirements into IT hardware design, ensuring seamless coordination between software-driven power management and hardware capabilities
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right
New

Software Engineer, Data Visualization

The Data Visualization team at OpenAI is responsible for building and maintainin...
Location
Location
United States , San Francisco
Salary
Salary:
230000.00 - 385000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience in full-stack software development, with a focus on building scientific or infrastructure visualization tools
  • Proficiency in both front-end and back-end programming languages such as Python, JavaScript, SQL, or similar
  • Familiar with front-end technologies like React and back-end technologies like Node.js, and databases like Snowflake
  • Experience with visualization libraries and frameworks (e.g., Plotly, Grafana)
  • Strong understanding of full-stack architecture, design principles, and best practices
  • Excellent problem-solving skills and attention to detail
  • Strong communication skills and the ability to work collaboratively in a team environment
Job Responsibility
Job Responsibility
  • Develop and maintain full-stack visualization tools for hardware and software analysis
  • Design intuitive front-end interfaces and robust back-end systems for monitoring the performance and health of supercomputer systems
  • Collaborate with researchers and engineers to understand their needs and deliver effective full-stack visualization solutions
  • Ensure high performance, reliability, and scalability of visualization tools across both front-end and back-end systems
  • Continuously improve existing tools and develop new features to meet evolving requirements
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right