This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Fabric Data Engineering Experience & Infrastructure team is hiring a Principle Software Engineering Manager to lead a team building LLM-powered data engineering experiences and supporting infrastructure for Fabric Data Engineering, based on Apache Spark. This role spans people leadership and technical leadership: you will grow and coach engineers while guiding design and delivery of agentic workflows and scalable LLM-backed data features (e.g., AI-assisted notebook experiences, evaluation/telemetry, production-grade orchestration patterns) that help Data Engineers achieve more through Microsoft Fabric.
Job Responsibility:
Lead and grow a team: Hire, onboard, coach, and develop engineers
set clear expectations
create an inclusive culture of accountability, learning, and collaboration.
Drive execution and delivery: Guide team planning and prioritization across multiple workstreams
Shape requirements with partners: Partner with Product Management, Design, Research, and dependent engineering teams to translate ambiguous customer needs into crisp scenario plans and measurable outcomes.
Guide architecture and technical strategy: Lead identification of dependencies and development of design documents
guide architectural decisions for distributed, cloud-scale systems (Spark/PySpark + Python services) with explicit tradeoffs across performance, reliability, cost, security, privacy, and operability.
Raise the engineering quality bar: Establish and reinforce engineering standards (design reviews, coding patterns, test strategy, performance practices, operational readiness)
ensure code and designs meet quality and scale expectations.
Operational excellence and accountability: Own service health for your area—live-site readiness, on-call excellence, incident response, postmortems, and sustained improvements. Hold accountability for outcomes when services do not meet performance or reliability expectations.
AI Engineering at production scale: Guide the team to build and operationalize LLM-powered experiences using robust orchestration, grounding, evaluation/quality gates, telemetry, and iterative improvements aligned to customer value and Responsible AI principles.
Cross-team influence: Build partner relationships across organizations and geographies
align on shared goals, interfaces, and SLAs
unblock execution and drive decisions when tradeoffs arise.
Requirements:
Bachelor's Degree in Computer Science, or related technical discipline AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Nice to have:
Modern LLM / AI Engineering: Solid understanding of LLM systems and applied AI Engineering (prompting, grounding/RAG, tool/function calling, agent orchestration, evaluation). Ability to define quality bars and drive adoption of repeatable patterns across teams.
Operationalizing AI/ML at scale: Experience establishing monitoring/telemetry, experimentation (A/B), rollout strategies, and cost/latency optimization—driving predictable operations and continuous improvement across services.
Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
People leadership: Experience leading and developing engineering teams (hiring, coaching, performance management, career growth), building inclusive culture, and improving team effectiveness.
Technical depth in distributed systems: Proven ability to guide design and delivery of scalable distributed systems and production services, including reliability, diagnosability, and operational excellence.
Spark + data platform expertise: Hands-on understanding of Apache Spark/PySpark and data engineering patterns for large-scale structured/semi-structured/unstructured workloads
ability to guide platform-level improvements (performance, cost, operability).
Cloud + security/compliance rigor: Cloud-native engineering experience (Azure compute/storage/networking) and ability to ensure solutions meet security, privacy, and compliance expectations.
Cross-team partner leadership: Demonstrated ability to align with multiple partner teams, manage dependencies, and deliver high-impact customer outcomes through influence and collaboration.