This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
When AI traffic scales, network operations must scale faster—and that’s exactly what we’re building. You’ll be part of the agentic AI team for Azure Networking Edge, where you create agents that take on the operational work required to keep the Edge healthy and reliable. You’ll modernize live-site operations from manual triage to agent-led execution, extending proven work where our agent already automates major NOC workflows and handles a significant portion of operational load. As a Software Engineer in our agentic AI team for Azure Networking Edge, you will design, build, and ship AI agents that turn manual network operations into reliable, end-to-end automated workflows. You’ll partner across engineering and operations to identify high-impact operational pain points and translate them into agentic experiences that improve speed, quality, and scalability for a rapidly growing network.
Job Responsibility:
Works with appropriate stakeholders (network engineers, NOC/DRIs, and partner teams) to determine requirements for agentic scenarios that replace manual network operations and reduce operational bottlenecks at scale.
Contributes to the identification of dependencies (tools, APIs, data sources, and operational processes) and develops design documents for agent-driven workflows and services with little oversight, ensuring they can be deployed safely in production.
Creates and implements code for production services and agent workflows, reusing code as applicable, to automate repetitive and routine DRI workloads and modernize execution paths that are currently manual/vendor-dependent.
Breaks down larger agent scenarios into smaller deliverables (capabilities, workflow steps, tooling integrations) and provides estimation, enabling iterative shipping of new operational coverage over time.
Acts as a Designated Responsible Individual (DRI) and participates in on-call to monitor services and agent-driven features for degradation or downtime, following playbooks and initiating mitigation actions to restore service health for supported scenarios.
Improves reliability, observability, and operational efficiency by staying current with developments that strengthen monitoring and operations at scale—specifically to support the shift from human-only execution to agentic execution as network growth accelerates.
Partners with cross-functional teams to drive adoption and execution quality, helping ensure the agent meaningfully reduces manual effort and scales operations sustainably as the network grows with AI workloads.
Requirements:
Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience.
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Nice to have:
Bachelor's Degree in Computer Science OR related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, OR Python
OR Master's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience.
1+ years of experience building, deploying, or operating AI‑powered automation or agents, including use of LLMs, rule‑based systems, or hybrid approaches to automate operational workflows.
1+ years of experience designing or operating systems with strong reliability, security, or safety guarantees, such as validation layers, guardrails, failure handling, or observability in production environments.