This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Do you want to be at the forefront of innovating the latest hardware designs to propel Microsoft’s cloud growth? Are you seeking a unique career opportunity that combines technical capabilities, cross-team collaboration with business insight and strategy? This Principal Systems SW Engineer role with a focus on supporting hardware/software co-design and evaluation of AI systems architecture concepts to improve datacenter performance, efficiency, and reliability might be the right one for you. If you’re interested in hardware/software co-design and evaluating AI system architecture concepts to improve datacenter performance, efficiency, and reliability, this Principal Systems SW Engineer role could be a strong fit. Join the Strategic Planning and Architecture (SPARC) team within Microsoft’s Azure Hardware Systems and Infrastructure (AHSI) organization, the team behind Microsoft’s expanding Cloud Infrastructure and for powering Microsoft’s “Intelligent Cloud” mission. Microsoft delivers more than 200 online services to more than one billion individuals worldwide, and AHSI is the team behind our expanding cloud infrastructure. We deliver the core infrastructure and foundational technologies for Microsoft's cloud businesses including Microsoft Azure, Bing, MSN, Office 365, OneDrive, Skype, Teams and Xbox Live. As part of the Systems Planning and Architecture (SPARC) group, you will help with pathfinding and architecture for future AI systems and related technologies that create advantages for Azure and Microsoft. You will collaborate across the Azure organization to evaluate next-generation datacenter technologies and influence Azure product roadmaps for both Microsoft and 3rd party silicon and systems.
Job Responsibility:
Spearhead system architecture exploration and definition for Microsoft’s custom AI systems
Identify system level co-design opportunities working across GPU, host, network, storage and memory vectors
Conduct comprehensive architecture analysis for next-generation ML model architecture, with a deep understanding of Azure AI usecases
Run simulations to evaluate solutions and build end-to-end hardware and software prototypes
Collaborate with cross-functional teams to develop full stack technology across hardware and software, to mature concepts from PoC to productization
Requirements:
Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
Nice to have:
Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Deep expertise in AI scale-up and scale-out networking/interconnect architectures, along with a good understanding of memory/storage technologies
Deep understanding of AI inference systems and associated software, and emerging approaches to orchestrate tiered memory and storage capabilities for distributed serving and KV caching for agentic systems
Understanding of GPU compute and systems in the cloud, including CPU, memory, networking and storage technologies
Understanding of the system software, storage and communication library integration into AI frameworks
Intellectual curiosity and passion about learning and deploying new technologies
Problem-solving skills, analytical capabilities, and attention to details
Ability to manage through ambiguity, bringing clarity and results orientation to engage and energize collaborators and stakeholders
Experience leading and driving complex projects with respect and integrity, including those with multiple workstreams spanning different business and technical disciplines
Skilled in partnering and influencing architects, hardware engineers, and software leads
Collaboration skills, teamwork, and sense of presumed responsibility
Verbal and written communication skills, and ability to articulate and engage with both technical and non-technical stakeholders at all levels