Supercomputing Test Software Engineer Job at Etched (Taipei)

Supercomputing Engineer (Test)

We are seeking highly motivated and detail-oriented Supercomputing Engineer (Tes...

Location

United States , San Jose

Salary:

150000.00 - 275000.00 USD / Year

Etched

Expiration Date

Until further notice

Requirements

Proficiency in at least one scripting language (e.g., Python, Bash, Go)
Experience with software testing methodologies and tools
Strong understanding of operating systems (Linux preferred) and server hardware architectures
Ability to analyze complex technical problems and provide effective solutions
Excellent communication and collaboration skills
Ability to work independently and as part of a team
Experience with version control systems (e.g., Git)
Experience with reading and interpreting hardware logs

Job Responsibility

Test Development: Design, develop, and implement automated burn-in test suites using common scripting languages (Python, Go, Bash) and test frameworks across all aspects of System Operation including: boot sequences, root-of-trust, system management, workload deployment and performance
Test Execution: Execute burn-in tests on server hardware, monitor system performance and health, and analyze test results
Failure Analysis: Investigate and debug hardware and software failures identified during testing, providing detailed reports and mitigation plans
Collaboration: Collaborate with internal and external hardware and software engineering teams to identify root causes of failures and implement corrective actions
Test Infrastructure: Contribute to the development and maintenance of the burn-in testing infrastructure, including portable test environments and automation tools runable in any environment
Documentation: Create and maintain comprehensive documentation for test plans, test cases, and test results
Performance Analysis: Analyze system performance metrics to identify potential bottlenecks and areas for optimization
Continuous Improvement: Participate in continuous improvement efforts to enhance the efficiency and effectiveness of the burn-in testing process

What we offer

Medical, dental, and vision packages with generous premium coverage
$500 per month credit for waiving medical benefits
Housing subsidy of $2k per month for those living within walking distance of the office
Relocation support for those moving to San Jose (Santana Row)
Various wellness benefits covering fitness, mental health, and more
Daily lunch + dinner in our office

Fulltime

Supercomputing Software Engineer

We are seeking a highly skilled and motivated Supercomputing Software Engineer t...

Location

Taiwan , Taipei

Salary:

Not provided

Etched

Expiration Date

Until further notice

Requirements

Proficiency in C/C++ or Python
Strong understanding of BIOS and BMC firmware architectures
Experience with server boot processes
Knowledge of root-of-trust and security principles
Strong understanding of operating systems (Linux preferred) and server hardware architectures
Experience with advanced system logging and diagnostic tools
Ability to analyze complex technical problems and provide effective solutions
Excellent communication and collaboration skills
Experience with version control systems (e.g., Git)
Experience with reading and interpreting hardware logs

Job Responsibility

Integrate and maintain BIOS and BMC firmware, ensuring robust and efficient server boot processes
Measure and Tune System Performance Configuration: Analyze DRAM timings, PCIe configurations, power state transitions etc. to ensure high performance and maximal reliability
Root of Trust and Security: Validating security features, including root of trust mechanisms, to protect system integrity and data security
Advanced System Logging and Diagnostics: Design and implement advanced system logging and diagnostic capabilities to facilitate efficient troubleshooting and performance analysis
Data Center Orchestration Integration: Integrate and optimize node-level data center orchestration technologies, such as Kubernetes and Docker, into the system software stack
System Validation and Testing: Develop and execute comprehensive test plans to validate system software functionality, stability, and performance
Collaboration and Troubleshooting: Collaborate with hardware and software teams to diagnose and resolve complex system-level issues

What we offer

Competitive compensation packages including generous equity packages
Comprehensive insurance coverage and other top-of-market benefits

Fulltime

HPC Senior Technical Writer

In this position you will collaborate with knowledge management project leads an...

Location

United States of America , Chippewa Falls

Salary:

81500.00 - 187500.00 USD / Year

Hewlett Packard Enterprise

Expiration Date

Until further notice

Requirements

Bachelor's degree in Technical Communications, Computer Science, or related technical/communications field with 4-6 years related experience
Advanced University degree and 2-4 years' experience or equivalent
Understands concepts and develops in-depth working knowledge of products, applications, and systems in assigned area of responsibility
Ability to deliver on multiple project technical requirements, schedules, and information formats
Codes in HTML, DHTML, XML, JavaScript or similar as required
Applies developed subject matter knowledge to solve common and complex business issues and recommends appropriate alternatives
Works on problems of diverse complexity and scope
May act as a team or project leader providing direction to team activities and facilitates information validation and team decision making process
Exercises independent judgment to identify and select a solution
Knowledge of HPC system software and hardware components, including operating systems, programming languages, system monitoring applications, HPC storage, chassis, servers, compute nodes, blades, HPC storage, coolant systems, power supplies, high speed network switches and cabling, and more

Job Responsibility

Create technical product documentation for software products and hardware
Analyze customer information requirements and product specifications to define scope of work and documentation plan
Identify and address the needs of all user groups, including end users, system administrators, internal support engineers, product developers, integration test teams, and training developers
Test documentation for install or administrative tasks to improve information deliverables and provide feedback on ease of use and user interfaces to product development
Manage workload in Jira and source management tools, including SDL, Oxygen, Git, and Github, to manage changes in the shared work environment
Create, revise, and manage content in Oxygen Author (DITA), Markdown, and other content tools
Work with developers, testers, product managers, technical support, and training to identify new features and content that needs to be reworked

What we offer

Health & Wellbeing
Personal & Professional Development
Unconditional Inclusion

Fulltime

Software Engineer II

Microsoft Azure Artificial Intelligence/High Performance Computing (AI/HPC) team...

Location

United States , Multiple Locations

Salary:

100600.00 - 199000.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements
Microsoft Cloud Background Check

Job Responsibility

Be proactive and innovative about adding new metrics for monitoring the health of the supercomputers
Collaborate with team members and stakeholders to understand requirements and produce detailed, data-driven, collaborative design for assigned features
Independently uses appropriate artificial intelligence tools and practices across the software development lifecycle to develop, test, debug, and maintain code for Supercomputer health monitoring systems
Remain current in skills by investing time and effort into staying abreast of current developments that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale
Act as a Designated Responsible Individual (DRI) working on-call to monitor system/product feature/service for degradation, downtime, or interruptions and gain approval to restore system/product/service for simple problems

Fulltime

Principal Software Engineer

Microsoft Azure High Performance Computing & AI Engineering (HPC & AI Eng) team ...

Location

United States , Multiple Locations

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python - OR equivalent experience
5+ years hands on experience designing and developing high volume low latency pipelines using products such as AzPubSub, Event Hubs, Azure Stream Analytics, Kafka, Grafana, Event Hubs, Prometheus or equivalent products
3+ years of experience with one of AI/HPC system management OR High-Speed Networks OR HPC Storage OR managing Cloud Infrastructure
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter

Job Responsibility

Architect, design and develop high volume low latency end to end event pipelines that can provide first-to-know-insights on events causing job interrupts and job reliability
Conduct analysis of existing event pipelines to evaluate fidelity, granularity and latency of critical events
Contribute to improving key metrics such as Job Mean Time to Interrupt, Nodes in Service, Mean Time to Resolve on flagship supercomputers by enabling data scientists and domain experts to use the telemetry to identify events & issues at the intersection of datacenter and hardware, develop hypothesis, conduct A/B tests and synthesize results
Partner with cross organizational teams to evaluate available telemetry and latency drive architecture, design, development and deployment of end-to-end solutions to manage core infrastructure including current & next generation datacenter, IT hardware, power & cooling technologies
Drive engineering and operational excellence based on issues and learnings from strategic customers on their usage scenarios to improve product features and capabilities
Partner with teams on continuous learning and continuous improvement programs by leading the resolution of complex incidents, driving root cause analyses and championing initiatives to minimize future customer impact

Fulltime

Supercomputing Engineer (Network)

We are seeking highly motivated and skilled Supercomputing Engineers (Network) t...

Location

United States , San Jose

Salary:

150000.00 - 275000.00 USD / Year

Etched

Expiration Date

Until further notice

Requirements

Proficiency in C/C++
Proficiency in at least one scripting language (e.g., Python, Bash, Go)
Strong experience with device-to-device networking technologies (RDMA, GPUDirect, etc.), including RoCE
Experience with zero-copy networking, RDMA verbs and memory registration
Familiarity with queue pairs, completions queues, and transport types
Strong understanding of operating systems (Linux preferred) and server hardware architectures
Ability to analyze complex technical problems and provide effective solutions
Excellent communication and collaboration skills
Ability to work independently and as part of a team
Experience with version control systems (e.g., Git)

Job Responsibility

Design, develop, and implement RDMA based networking peering, supporting high bandwidth, low latency communication across PCIe nodes within and across racks
Develop tests that qualify host processors (x86), NICs, TORs and device network interfaces for high performance
Furnish burn-in teams with tests that represent both real-world use cases and workloads for device to device networking, and extreme-load stress testing
Define the key metrics that system software must collect to maintain high availability and performance under extreme communications workloads

What we offer

Medical, dental, and vision packages with generous premium coverage
$500 per month credit for waiving medical benefits
Housing subsidy of $2k per month for those living within walking distance of the office
Relocation support for those moving to San Jose (Santana Row)
Various wellness benefits covering fitness, mental health, and more
Daily lunch + dinner in our office

Fulltime

Software Engineer, Hardware

As a software engineer on the Scaling team, you’ll help build and optimize the l...

Location

United States , San Francisco

Salary:

266000.00 - 455000.00 USD / Year

OpenAI

Expiration Date

Until further notice

Requirements

Proficient in systems programming (e.g., Rust, C++) and scripting languages like Python
Experience in one or more of the following areas: compiler development, kernel authoring, accelerator programming, runtime systems, distributed systems, or high-performance simulation
Deep curiosity for how large-scale systems work and enjoy making them faster, simpler, and more reliable
Excited to work in a fast-paced, highly collaborative environment with evolving hardware and ML system demands
Value engineering excellence, technical leadership, and thoughtful system design

Job Responsibility

Design and build APIs and runtime components to orchestrate computation and data movement across heterogeneous ML workloads
Contribute to compiler infrastructure, including the development of optimizations and compiler passes to support evolving hardware
Engineer and optimize compute and data kernels, ensuring correctness, high performance, and portability across simulation and production environments
Profile and optimize system bottlenecks, especially around I/O, memory hierarchy, and interconnects, at both local and distributed scales
Develop simulation infrastructure to validate runtime behaviors, test training stack changes, and support early-stage hardware and system development
Rapidly deploy runtime and compiler updates to new supercomputing builds in close collaboration with hardware and research teams
Work across a diverse stack, primarily using Rust and Python, with opportunities to influence architecture decisions across the training framework

What we offer

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
401(k) retirement plan with employer match
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
Mental health and wellness support
Employer-paid basic life and disability coverage
Annual learning and development stipend to fuel your professional growth
Daily meals in our offices, and meal delivery credits as eligible

Fulltime

Electrical Engineer - Systems

The Scaling team works on the design of our AI supercomputers, doing everything ...

Location

United States , San Francisco

Salary:

225000.00 - 445000.00 USD / Year

OpenAI

Expiration Date

Until further notice

Requirements

At least 10 years of industry experience, including experience designing hardware systems for data center applications
experience in designing EE circuit, CPU/GPU/TPU hw system design, board bring up, system design, integration, and system bring up
Master's degree in Electrical Engineering, Computer Engineering, Physics, a related field, or equivalent practical experience
Have a strong bias toward action, and won’t take no for an answer
Have experience and good knowledge of system design experience in the mechanical and product design areas, from xPUs, board, rack level to data center level
Have a strong intrinsic desire to learn and fill in missing skills
and an equally strong talent for sharing that information clearly and concisely with others
Are comfortable with ambiguity and rapidly changing conditions

Job Responsibility

Work on Machine Learning/AI hardware systems projects to craft the solutions for current and future data center deployments
Worked with hardware team on test vehicle, bring up board design, evaluating end to end system design trade off
Lead EE circuit level design, work with power, thermal, mechanical teams to drive AI hardware system design
Work with product teams to ensure that goals are met with systems and will work with ASIC/FPGA, Software, and Verification teams to ensure proper verification of features
Work with the manufacturing teams to ensure that designs are manufacturable and ready for volume production, and with the field teams to support systems that are deployed in the data center
Gather system requirements, define architecture, execute hardware design, and product validation
Lead the system bring up, validation, NPI, deployment, and sustaining of hardware solutions
Work cross-functionally with Hardware, Software, Mechanical, Thermal, Validation, Manufacturing, and external vendors
Drive system development from concept through production
Lead debug and root cause analysis of deployed systems

What we offer

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
401(k) retirement plan with employer match
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
Mental health and wellness support
Employer-paid basic life and disability coverage
Annual learning and development stipend to fuel your professional growth
Daily meals in our offices, and meal delivery credits as eligible

Fulltime

Supercomputing Test Software Engineer

Etched

Location:
Taiwan , Taipei

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
February 18, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Supercomputing Test Software Engineer

Supercomputing Engineer (Test)

Supercomputing Software Engineer

HPC Senior Technical Writer

Software Engineer II

Principal Software Engineer

Supercomputing Engineer (Network)

Software Engineer, Hardware

Electrical Engineer - Systems

Supercomputing Test Software Engineer

Etched

Location:Taiwan , Taipei

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:February 18, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Supercomputing Test Software Engineer

Supercomputing Engineer (Test)

Supercomputing Software Engineer

HPC Senior Technical Writer

Software Engineer II

Principal Software Engineer

Supercomputing Engineer (Network)

Software Engineer, Hardware

Electrical Engineer - Systems

Location:
Taiwan , Taipei

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
February 18, 2026