This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The M365 Copilot App Platform team is the team that provides the platform APIs, infrastructure and backend web server for the Microsoft 365 Copilot app. All partner teams have built their AI-enabled experiences on our platform and depend on us for their success. We own everything from the application code itself to the platform APIs to the deployment pipelines and infrastructure including the backend web server and middle-tier service that supports the application on the web, Windows, and Mac. This role is central to enabling the M365 Copilot app—one of Microsoft’s key strategic products in the competitive AI landscape.
Job Responsibility:
Leverage expertise in distributed systems, cloud technology layers, platform APIs, and infrastructure components to improve availability, reliability, performance, observability, and security of the middle-tier services
Identify opportunities to enhance service quality by analyzing production telemetry and applying insights to propose and implement engineering changes
Participate in on‑call rotations and incident responses, engaging with product engineering teams throughout the product lifecycle
Independently create, test, and deploy changes through safe deployment processes (SDP) to improve operability and code quality
Collaborate with engineers and architects to diagnose and resolve production issues and prevent recurrence
Develop and maintain the middle-tier service, platform APIs, deployment pipelines, and infrastructure supporting the M365 Copilot app
Work closely with partner teams to enable new capabilities and ensure the platform meets reliability and performance requirements
Contribute to the continuous evolution of infrastructure and tooling to support services at scale
Collaborate with cross-functional teams to enable the M365 Copilot app and drive innovation
Work closely with partner teams to build new additional capabilities into our application
Requirements:
Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements
This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Nice to have:
Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience or experience as Site Reliability Engineer in building and shipping production software or services with code in languages including, but not limited to, C#, JavaScript or Typescript OR equivalent experience
Experience in distributed systems and/or cloud platforms (Azure, Kubernetes, Docker, containers ecosystem)
Proven ability to modify componentized, well-architected infrastructure software and collaborate across teams
1+ years experience with incident management and reliability engineering in cloud or AI environments
Proficient in scripting (PowerShell, Shell script, etc.) and expertise in Linux
Technical experience working with large-scale cloud or distributed systems
Experience running highly-available, mission-critical large-scale distributed systems, including domain expertise in areas such as scalable & fault tolerant system design, observability & monitoring, safe change management, automation, reliability & security risk reduction
Motivated and self-driven
Strong cross-team communication and partnership skills
Creativity, insightfulness, and sensitivity for a dynamic approach to problem solving