CrawlJobs Logo

Senior Systems Engineer HPC

rackspace.com Logo

Rackspace

Location Icon

Location:
India , Gurgaon

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Responsibility:

  • System Administration & Maintenance: Install, configure, and maintain HPC clusters (hardware, software, operating systems), perform regular updates/patching, manage user accounts and permissions, and troubleshoot/resolve hardware or software issues
  • Performance & Optimization: Monitor and analyse system and application performance, identify bottlenecks, implement tuning solutions, and profile workloads to improve efficiency
  • Cluster & Resource Management: Manage and optimize job scheduling, resource allocation, and cluster operations using tools such as Slurm, LSF, Bright Cluster Manager / Base Command Manager, OpenHPC, and Warewulf
  • Networking & Interconnects: Configure, manage, and tune Linux networking (TCP/IP, DNS, routing) and high-speed HPC interconnects (InfiniBand, Ethernet) to ensure low-latency, high-bandwidth communication
  • Storage & Data Management: Implement and maintain large-scale storage and parallel file systems (Lustre, Ceph, GPFS), ensure data integrity, manage backups, and support disaster recovery
  • Security & Authentication: Implement security controls, ensure compliance with policies, and manage authentication and directory services such as LDAP and Active Directory
  • DevOps & Automation: Use configuration management and DevOps practices (Ansible, Terraform, Jenkins, Git) to automate deployments, application packaging (RPM/DEB), and system configurations
  • User Support & Collaboration: Provide technical support, documentation, and training to researchers
  • collaborate with scientists, HPC architects, and engineers to align infrastructure with research needs
  • Planning & Innovation: Contribute to the design and planning of HPC infrastructure upgrades, evaluate and recommend hardware/software solutions, and explore cloud-based HPC solutions where applicable

Requirements:

  • Bachelor’s degree in Computer Science, Engineering, or a related field (equivalent experience may substitute for degree)
  • Minimum of 10 years of systems experience, including at least 5 years working specifically with HPC
  • Strong knowledge of Linux operating systems (e.g., Rocky Linux, Ubuntu) with a fundamental understanding of Linux internals, system administration, and performance tuning
  • Experience building and managing RPM and DEB packages
  • Experience with cluster management tools such as Bright Cluster Manager, OpenHPC stack, or Warewulf
  • Proficiency with job schedulers and resource managers such as Slurm and LSF
  • Strong understanding of Linux networking (e.g., TCP/IP, DNS, routing) and HPC interconnects (e.g., InfiniBand, Ethernet) including performance tuning
  • Knowledge of parallel file systems such as Lustre, Ceph, or GPFS
  • Working knowledge of Linux authentication and directory services such as LDAP and Active Directory
  • Strong experience with DevOps and configuration management tools, including Ansible, Terraform, Jenkins, and Git
  • Strong knowledge of Linux security, compliance standards, and data protection best practices
  • Excellent communication, interpersonal, and problem-solving skills

Nice to have:

  • Proficiency in scripting languages (e.g., Python, Bash, R) and familiarity with MPI libraries for parallel and distributed computing
  • Knowledge of HPC in cloud environments (e.g., AWS, Azure, GCP HPC offerings) is a plus

Additional Information:

Job Posted:
January 05, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Systems Engineer HPC

Senior HPC Deployment Engineer

As a High Performance Computer (HPC) Solution Installation and Deployment Engine...
Location
Location
Australia , Melbourne
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in installing, configuring, and deploying HPC systems
  • strong knowledge of HPC architectures, parallel computing, and cluster management
  • proficiency in Linux/Unix operating systems
  • experience with HPC software tools and libraries (e.g., MPI, OpenMP, SLURM, Torque)
  • familiarity with high-speed networking technologies (e.g., InfiniBand, Ethernet)
  • excellent problem-solving skills and attention to detail
  • strong communication and interpersonal skills
  • ability to work independently and as part of a team
  • certifications in relevant technologies (e.g., Red Hat Certified Engineer, Certified HPC Professional)
  • experience with cloud-based HPC solutions
Job Responsibility
Job Responsibility
  • Install and configure HPC hardware and software components, including servers, storage, and networking equipment
  • set up and manage high-speed interconnects (e.g., InfiniBand, Ethernet)
  • deploy operating systems, cluster management software, and parallel file systems
  • coordinate with clients and project managers to understand deployment requirements and timelines
  • implement and document HPC deployment processes and best practices
  • perform system testing and validation to ensure optimal performance and reliability
  • provide technical support to clients during the installation and deployment phases
  • conduct training sessions for clients on HPC system usage and maintenance
  • develop and maintain user documentation and guides
  • monitor and analyze system performance to identify and resolve bottlenecks
What we offer
What we offer
  • Comprehensive suite of benefits supporting physical, financial, and emotional wellbeing
  • specific programs for personal and professional development
  • inclusion and flexibility to manage work and personal needs
  • Fulltime
Read More
Arrow Right

Senior Linux System Administrator - Support Engineer

Senior Linux System Administrator/System Support Engineer with expertise support...
Location
Location
Australia , Canberra
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Information Technology, or related field, or equivalent work experience
  • At least 5 years of hands-on experience managing Linux systems in production environments, including HPC systems
  • Expertise in Linux/Unix operating systems, parallel file systems (Lustre, GPFS), and networking technologies
  • Proficiency in scripting/programming languages (Bash, Python, Perl, C++)
  • Experience with automation/configuration management tools (Ansible, Puppet, Chef, Terraform)
  • Strong understanding of networking concepts (TCP/IP, DNS, DHCP, firewalls, VPNs)
  • Familiarity with monitoring/logging tools (Nagios, Grafana, ELK Stack)
  • Experience with containerization technologies (Docker, Kubernetes)
  • Excellent problem-solving, analytical, and communication skills
  • Demonstrated ability to work independently in multi-technology environments and collaborate across teams
Job Responsibility
Job Responsibility
  • Deploy, configure, maintain, and troubleshoot Linux servers (Red Hat, CentOS, Ubuntu, or others) across physical, virtual, and cloud environments
  • Support, maintain, and optimize HPC systems, including installation, servicing, and advanced technical troubleshooting of hardware/software and parallel file systems
  • Monitor system performance, availability, and security using industry-standard tools and practices
  • Plan and execute upgrades, patches, enhancements, and migrations to ensure systems are current, secure, and optimized
  • Automate system administration tasks using scripting languages and configuration management tools
  • Implement and maintain backup/recovery strategies, disaster recovery plans, and system documentation
  • Collaborate with development, network, and security teams to support application deployments and troubleshoot issues
  • Provide technical consulting, mentoring, and guidance to junior team members
  • Ensure compliance with strict security protocols in sensitive environments
  • Participate in on-call rotation and respond to system incidents and outages
What we offer
What we offer
  • Competitive salary and performance-based bonuses
  • Comprehensive health, dental, and vision insurance
  • Retirement plan options
  • Paid time off and holidays
  • Professional development opportunities
  • Flexible work arrangements
  • Fulltime
Read More
Arrow Right

Senior System Support Engineer – High Performance Computing

The HPC Senior System Support Engineer provides highly visible on-site technical...
Location
Location
Australia , Canberra
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A minimum TSPV Government Security clearance is mandatory for the role
  • Expertise in Linux/Unix operating systems, parallel file systems (e.g., Lustre, GPFS) and networking technologies is essential
  • Proficient in programming and scripting languages such as Python and C++
  • Ability to develop solutions that enhance the availability, performance, maintainability and agility of HPC solutions
  • Has contributed to the design and application of new tools
  • Possesses an understanding, at a detailed level, of architectural dependencies of technologies in use in the customer's IT environment
  • Frequently uses product and application knowledge along with internals or architectural knowledge to develop solutions
  • Able to communicate with internal and external senior management confidently and demonstrate the professionalism
  • Ability to work in a multi- technology environment with the ability to diagnose complex technical problems to their root cause
  • In addition to troubleshooting skills and consulting skills, has ability to summarise prognosis and impact at practice lead level
Job Responsibility
Job Responsibility
  • Responsible for verifying and implementing the detailed technical design solution to the problem as identified by the Project/Technical Manager
  • Provides detailed technical design, analyses and develops enterprise solutions
  • Regularly leads technical assessment and delivery solutions to the customer
  • Coordinates implementation of new installations, designs, and migrations for HPC solutions
  • Provides advanced technical consulting and advice to others on proposal efforts, solution design, system management, tuning and modification of solutions
  • Provides input to the company strategy moving forward
  • Collects and determines data from appropriate sources to assist in determining customer needs and requirements
  • Responds to requests for technical information from customers
  • Engages in technical problem solving across multiple technologies
  • often needs to develop new methods to apply to the situation
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Senior Research Engineer

The HPE HPC & AI EMEA Research Lab (ERL) is characterized by a unique blend of i...
Location
Location
Germany , Munich, Berlin
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Development experience in compiled languages such as C, C++ or Fortran and experience with interpreted environments such as Python
  • At least a B.Sc. equivalent in a Science, Technology, Engineering or Mathematical discipline
  • Parallel programming experience, with programming models such as OpenMP, MPI, CUDA, OpenACC, HIP, PGAS languages, etc.
  • An understanding of AI/ML frameworks, experience with frameworks such as TensorFlow or PyTorch is highly desirable
  • An interest in system- and data center monitoring and operational data analysis
  • Professional language skills in English and German
Job Responsibility
Job Responsibility
  • Perform world-class research while also shaping products of the future
  • Work with the most esteemed research partners across Europe
  • Enable high performance research software on pre-Exascale and Exascale supercomputers
  • Provide new environments/abstractions to support application developers to build, deploy, and run applications taking advantage of leading-edge hardware at scale
  • Make and operate HPC/AI systems and datacenters in a sustainable way
  • Manage modern data-intensive workloads in high performance environments
What we offer
What we offer
  • Competitive salary and extensive benefits package (pension scheme, insurances, bike and car leasing, and other fringe benefits)
  • Work-life balance (flexible working time and hybrid workplace model, 30 vacation days, four HPE Wellness-Fridays, up to six months paid parental leave)
  • Support for education, training, and career development
  • Diverse and dynamic work environment
Read More
Arrow Right

Senior Software Engineer

Senior Software Engineer responsible for delivering integrated product solutions...
Location
Location
United States , St. Louis
Salary
Salary:
Not provided
sovereigntec.com Logo
Sovereign Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Advanced ability to translate business needs and problems into systems design and technical solutions
  • Proven experience with structured and object-oriented programming, design patterns, relational database design, operating systems, networking concepts, and systems integration
  • Demonstrated ability to evaluate project objectives and scope for feasibility, understanding, and scheduling, and to ensure projects meet budget and plan criteria
  • Complex analytical and problem-solving skills
  • Ability to multi-task and work well within a team environment
  • Advanced interpersonal skills, demonstrating an ability to apply leadership when required
  • Advanced oral and written communication skills
  • Agile
  • Master’s degree in Computer Science
  • Certification in Microsoft C#.NET software development
Job Responsibility
Job Responsibility
  • Provide IT solution design, delivery, and support expertise in Prophet, C#, Web, JavaScript, Oracle, and SQL Server technologies
  • Apply leadership and ownership through full solution development lifecycle while providing estimates, deliverables, and results
  • Meet regularly with Project Management and Technical Leads to manage status, milestones, risks, and issues in an Agile SDLC
  • Engage in customer planning sessions and demonstrate ability to drive out requirements
  • Analyze requirements, develop technical specifications, and perform solution gap analysis via Agile/Scrum methodology
  • Provide technical and/or business application consultation to customers and team members regarding functionality, architecture, operating systems, and databases for complex product systems
  • Prepare and present application and programming design solutions to fulfill business requirements
  • Engage technical analysts and business users to provide input on test cases, test scenarios, and test plans
  • Engage teams outside of immediate group as required (e.g. product integration points, infrastructure, help desk, security, and vendors)
  • Evaluate and balance application change risk with business need for timely product enhancements
Read More
Arrow Right

Senior Solution Architect AI & HPC

AI is a high-growth market for HPE, and we believe we are uniquely suited to bri...
Location
Location
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Engineering, Computer Science, or similar quantitative focus preferred
  • Ability to quickly prototype functionality into scripts for demos, integrations, troubleshooting, etc.
  • Expertise in cloud architectures, specifically with public cloud platforms such as AWS, Azure, or Google Cloud
  • Strong understanding of AI technologies, including machine learning, deep learning, and neural networks
  • Experience participating in solution configurations and the creation of PoCs to meet customer requirements
  • Solid knowledge of infrastructure components, including servers, storage, networking, and virtualization
  • Experience with high-performance computing (HPC) and GPU-accelerated systems is advantageous
  • Demonstrates expert technical skills in assigned area of specialization
  • Expert knowledge of the company offerings, strategic initiatives, current trends, competitor products and strategies within area of responsibility
  • Expert level written and verbal communication skills and mastery over English and local language
Job Responsibility
Job Responsibility
  • Collaborate with sales teams to understand customer requirements and develop tailored solutions for their AI infrastructure needs
  • Engage in pre-sales activities, including technical presentations, demonstrations, and proof-of-concepts
  • Act as a trusted advisor to customers, addressing their questions, concerns, and technical challenges effectively
  • Stay up-to-date with the latest advancements in AI technologies, cloud architectures, and infrastructure trends
  • Lead Proof-of-Concepts (PoC) for HPE customers expanding into Deep Learning or Machine Learning use cases
  • Architect reusable end-to-end AI solutions for HPE customers and prospects
  • Lead technical discussions with customers and partners to propose HPE and partner Integrated solutions
  • Identify solutions, define action plans, and help coordinate and deliver optimal solutions and enhancements
  • Recommend configurations and settings for different types of hardware and interconnect fabrics
  • Assist in any product or technical issue towards an initial sale or renewal of a customer
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits that supports physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

Customer Support Engineer

As a Customer Support Engineer at a pioneering AI company, you'll be the first l...
Location
Location
India
Salary
Salary:
Not provided
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in a customer-facing technical role with at least 1 year in a support function in AI
  • Strong technical background, with knowledge of AI, ML, GPU technologies and their integration into high-performance computing (HPC) environments
  • Familiarity with infrastructure services (e.g., Kubernetes, SLURM), infrastructure as code solutions (e.g., Ansible) high-performance network fabrics, NFS-based storage management, container infrastructure, and scripting and programming languages
  • Familiarity with operating storage systems in HPC environments such as Vast and Weka
  • Familiarity with inspecting and resolving network-related errors
  • Strong knowledge of Python, TypeScript, and/or JavaScript with testing/debugging experience using curl and Postman-like tools
  • Foundational understanding in the installation, configuration, administration, troubleshooting, and securing of compute clusters
  • Complex technical problem solving and troubleshooting, with a proactive approach to issue resolution
  • Ability to work cross-functionally with teams such as Sales, Engineering, Support, Product and Research to drive customer success
  • Strong sense of ownership and willingness to learn new skills to ensure both team and customer success
Job Responsibility
Job Responsibility
  • Engage directly with customers to tackle and resolve complex technical challenges involving our cutting-edge GPU clusters and our inference and fine-tuning services
  • ensure swift and effective solutions every time
  • Become a product expert in all of our Gen AI solutions, serving as the last line of technical defense before issues are escalated to Engineering and Product teams
  • Collaborate seamlessly across Engineering, Research, and Product teams to address customer concerns
  • collaborate with senior leaders both internally and externally to ensure the highest levels of customer satisfaction
  • Transform customer insights into action by identifying patterns in support cases and working with Engineering and Go-To-Market teams to drive Together’s roadmap (e.g., future models to support)
  • Maintain detailed documentation of system configurations, procedures, troubleshooting guides, and FAQs to facilitate knowledge sharing with team and customers
  • Be flexible in providing support coverage during holidays, nights and weekends as required by business needs to ensure consistent and reliable service for our customers
What we offer
What we offer
  • competitive compensation
  • startup equity
  • health insurance
  • flexibility in terms of remote work for the respective hiring region
Read More
Arrow Right

HPC Principal Federal Technical Consultant

Principal Consultant to join our High-Performance Computing (HPC) team. In this ...
Location
Location
United States
Salary
Salary:
115500.00 - 266000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of professional experience, with at least 3+ in HPC architecture, systems engineering, or large-scale infrastructure design
  • Advanced degree in Computer Science, Engineering, Physics, or related technical field (or equivalent experience)
  • Proven ability to design and deliver complex, multi-vendor HPC solutions at scale
  • Demonstrated ability to independently complete solution implementations and application design deliverables
  • Must be United States Citizen due to the responsibilities and requirements of the role as this will be supporting a Federal site
  • Top Secret Clearance, TS/SCI with Full Scope Polygraph (FSP)
  • Must be willing to travel as the business dictates
  • Expertise in one or more of the following: parallel computing, MPI/OpenMP, GPU acceleration, workload schedulers (Slurm, Altair PBS Pro, Torque/MOAB, etc.), or large-scale data storage systems (Lustre, GPFS, Ceph)
  • Experience with Network boot technologies (PXE or gPXE/Etherboot etc)
  • Storage specific knowledge: LVM, RAID, iSCSI, Disk partitioning (GPT, MBR)
Job Responsibility
Job Responsibility
  • Lead the technical implementation design and delivery of world class scale HPC solutions, from requirements gathering to implementation
  • Provide architectural guidance on compute, storage, networking, and workload management tailored to customer use cases
  • Configure, deploy, and maintain Linux-based HPC clusters, associated storage, and network infrastructure
  • Work in close collaboration with customers on finalizing and deploying HPC software applications, hosting platforms, and management systems that enable customer research and production workloads
  • Provide technical support and troubleshooting for HPC implementation in secure locations
  • Work on both operational support and strategic HPC projects
  • actively participate in customer user group environments
  • Evaluate and implement new tools, middleware, and methodologies to improve operations and service delivery
  • Ensure compliance with enterprise IT security and technology controls
  • Act as principal consultant in customer engagements, often leading cross-functional project teams (including customer staff)
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right