CrawlJobs Logo

Principal AI Network Architect

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Redmond

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

139900.00 - 274800.00 USD / Year

Job Description:

Microsoft Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) is the team behind Microsoft’s expanding Cloud Infrastructure and responsible for powering Microsoft’s “Intelligent Cloud” mission. SCHIE delivers the core infrastructure and foundational technologies for Microsoft's over 200 online businesses including Bing, MSN, Office 365, Xbox Live, Teams, OneDrive, and the Microsoft Azure platform globally with our server and data center infrastructure, security and compliance, operations, globalization, and manageability solutions. Our focus is on smart growth, high efficiency, and delivering a trusted experience to customers and partners worldwide and we are looking for passionate engineers to help achieve that mission. As Microsoft's cloud business continues to grow the ability to deploy new offerings and hardware infrastructure on time, in high volume with high quality and lowest cost is of paramount importance. To achieve this goal, the Cloud Hardware Systems Engineering (CHSE) team is instrumental in defining and delivering operational measures of success for hardware manufacturing, improving the planning process, quality, delivery, scale and sustainability related to Microsoft cloud hardware. We are looking for seasoned engineers with a dedicated passion for customer focused solutions, insight and industry knowledge to envision and implement future technical solutions that will manage and optimize the Cloud infrastructure. We are looking for a Principal AI Network Architect to join the team.

Job Responsibility:

  • Spearhead architectural definition and innovation for next-generation GPU and AI accelerator platforms, with a focus on ultra-high bandwidth, low-latency backend networks
  • Drive system-level integration across compute, storage, and interconnect domains to support scalable AI training workloads
  • Partner with silicon, firmware, and datacenter engineering teams to co-design infrastructure that meets performance, reliability, and deployment goals
  • Influence platform decisions across rack, chassis, and pod-level implementations
  • Cultivate deep technical relationships with silicon vendors, optics suppliers, and switch fabric providers to co-develop differentiated solutions
  • Represent Microsoft in joint architecture forums and technical workshops
  • Evaluate and articulate tradeoffs across electrical, mechanical, thermal, and signal integrity domains
  • Frame decisions in terms of TCO, performance, scalability, and deployment risk
  • Lead design reviews and contribute to PRDs and system specifications
  • Shape the direction of hyperscale AI infrastructure by engaging with standards bodies (e.g., IEEE 802.3), influencing component roadmaps, and driving adoption of novel interconnect protocols and topologies

Requirements:

  • Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 7+ years technical engineering experience
  • OR Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 8+ years technical engineering experience
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • 5+ years of experience in designing AI backend networks and integrating them into large-scale GPU systems
  • Proven expertise in system architecture across compute, networking, and accelerator domains
  • Deep understanding of RDMA protocols (RoCE, InfiniBand), congestion control (DCQCN), and Layer 2/3 routing
  • Experience with optical interconnects (e.g., PSM, WDM), link budget analysis, and transceiver integration
  • Familiarity with signal integrity modeling, link training, and physical layer optimization
  • Experience architecting backend networks for AI training and Inference workloads, including Hamiltonian cycle traffic and collective operations (e.g., all-reduce, all-gather)
  • Hands-on design of high-radix switches (≥400Gbps per port), orthogonal chassis, and cabled backplanes
  • Knowledge of chip-to-chip and chip-to-module interfaces, including error correction and equalization techniques
  • Experience with custom NIC IPs and transport layers for secure, reliable packet delivery
  • Familiarity with AI model execution pipelines and their impact on pod-level network design and latency SLAs
  • Prior contributions to hyperscale deployments or cloud-scale AI infrastructure programs

Additional Information:

Job Posted:
February 14, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Principal AI Network Architect

Principal AI Network Architect

Do you want to be at the forefront of innovating the latest hardware designs to ...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Master’s or Doctoral degree in Electrical Engineering, Computer Engineering, or related fields and 10+ years of technical experience in the domain
  • Deep expertise with ethernet networking, RDMA (RoCE, Infiniband), congestion control, and layer 2/3 switching
  • Experience architecting scale-out/backend network for AI GPU clusters
  • Familiarity with scale-up networks such as NVLinks, UALink
  • Experience with high radix ethernet switches
  • Familiarity with AI model execution pipelines, being able to analyze communication flows and its impact on model performance
  • Prior contributions in standards committee and experience on hyperscale network deployments would be an added benefit
  • Skilled in partnering and influencing architects, hardware engineers, and software leads
Job Responsibility
Job Responsibility
  • Leadership: Spearhead architecture definition and evaluation of AI accelerator platforms, with a focus on high bandwidth, low latency networks. Drive end to end optimization of the stack from hardware, the software kernels
  • Cross functional collaboration: Partner with silicon and platform design teams to co-design infrastructure that meets performance, reliability and deployment goals. Frame decisions in terms of TCO, performance, flexibility, scalability
  • Prototyping: You will be working with state of art networking lab to prototype new network architectures
  • Industry influence: Participate in industry consortiums to shape standards, and influence vendor roadmaps
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Senior Principal Technical Program Manager - ML Platform

Location
Location
Salary
Salary:
231300.00 - 301975.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience on software teams as Development Manager, Technical Product Manager or TPM leading technical platforms areas
  • Deep domain experience in AI and/or Search. Example: Model Inference, Model Evaluation, Model Training, LLM Ops, Semantic Search, Search Relevance, etc.
  • Partner with Engineering in defining direction, strategy and execution at Platform level
  • Strategic thinking and ability to understand business objectives to translate them into technical problems and programs.
  • Technical understanding of systems involved. Willingness to develop domain expertise in the area they operate - storage, networking, authentication, capacity management, service deployments, etc.
  • TPMs are not expected to write or read code, but are expected to understand system flows, block architectures, APIs and such.
  • Experience defining and running end-to-end complex technical programs
  • Strong leadership, organizational, and communication skills
Job Responsibility
Job Responsibility
  • Understand and stay up-to-date on latest innovations in AI and Search. Partner closely with engineering teams to translate these into practical platform evolution for Atlassian bringing value to our customers.
  • Analyze business objectives, customer needs, product adoption inhibitors and opportunities, industry trends, and based on these, in close collaboration with your stakeholders, define a long-term strategy and roadmap for your platform and product components.
  • Understand business objectives and translate them into technical systems problems that need to be prioritized solved in the current business environment.
  • Define specific systems programs and create a plan of action for realizing those programs. Such programs could be around capacity planning, migration efforts, high availability, network architecture, performance optimization, reliability improvements and more.
  • Use your technical understanding of Atlassian and related systems to partner with and influence engineers and architects in making progress on these problems.
  • Responsible for taking a systematic approach to engineering problems. This includes: prioritizing tasks, scoping out the project, defining objectives, and making consistent progress against each of these.
  • Be accountable for the success of these technical programs by managing the entire lifecycle from initiation to forecasting, budgeting, scheduling, etc.
  • Manage complex dependencies and projects with a broad scope across the company
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

Principal Machine Learning Engineer

Location
Location
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Fluency in at least one modern object-oriented programming language (preferably Java/Kotlin and Python)
  • Understanding of Machine Learning project lifecycle/tools along with prompt engineering
  • Experience in architecting and implementing high-performance RESTful microservices
  • Experience building and operating large scale distributed systems using Amazon Web Services (S3, Kinesis, Cloud Formation, EKS, AWS Security and Networking)
  • Experience with leveraging LLMs effectively and optimizing model usage on GPUs
  • Experience with Databricks or Apache Spark
  • Experience with Continuous Delivery and Continuous Integration
Job Responsibility
Job Responsibility
  • Regularly tackle the largest and most complex problems in the team, from technical design to launch
  • Work closely with Product, Engineering and Design leads in Jira AI, and translate their requirements into solid engineering deliverables, delegating work to the teams
  • Deliver solutions that are used by other teams and products
  • Follow a Product Engineer mindset by building features that are data-driven and customer-centric, fostering that culture within the Jira AI group
  • Exceptional problem solving ability using ML, AI and core software engineering
  • Routinely tackle complex architecture challenges and define architectural standards
  • Actively contribute to the code delivery through leading code reviews & documentation, direct contribution and fixing complex bugs in high-risk surface areas
  • Expertise in data analysis, statistical methods, and logical reasoning to inform data-driven decision-making
  • Partner across engineering teams to take on company-wide initiatives spanning multiple projects
  • Mentor junior members on the team
What we offer
What we offer
  • Atlassians can choose where they work – whether in an office, from home, or a combination of the two
  • Atlassians have more control over supporting their family, personal goals, and other priorities
Read More
Arrow Right

Principal Network Architect

Architect to help define the future of high-performance networking for HPC and A...
Location
Location
United States , Ft. Collins
Salary
Salary:
142000.00 - 310500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
April 27, 2026
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Electrical Engineering
  • Typically 10+ years experience
  • Deep understanding of network architecture and system-level design principles
  • Proven experience in evaluating architectural trade-offs and implementing optimization strategies
  • Strong ability to work effectively within cross-functional teams
  • Ability to effectively communicate product architectures, design proposals and negotiate options at business unit and executive levels
Job Responsibility
Job Responsibility
  • Define and document ASIC-level network architecture
  • Research and assess new networking technologies
  • Develop and document system-level network designs
  • Collaborate with network architects, ASIC designers, and software engineers to align architecture with system goals
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Data & AI Architect Principal

The Data and AI Architect Principal is central to BT International's ability to ...
Location
Location
Hungary , Budapest
Salary
Salary:
Not provided
plus.net Logo
Plusnet
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strategic Architecture Leadership – Proven ability to define data and AI platform vision with track record driving large-scale data transformation programs
  • Data Platform Architecture – Deep expertise in modern data platform design including data lakes, data warehouses, streaming architectures and data mesh patterns
  • AI/ML Architecture – Strong understanding of machine learning systems including feature engineering, model training, serving, monitoring and MLOps platforms
  • Agentic AI – Emerging expertise in agentic AI systems including LLM-based agents, tool use and reasoning frameworks for autonomous task execution
  • Technical Depth – Hands-on background with coding capability in data languages (Python, SQL, Spark), enabling credibility with engineering teams and participation in data work
  • Data Engineering Patterns – Strong understanding of ETL/ELT patterns, data quality frameworks, schema evolution and data lineage tracking
  • Cloud-Native Data – Experience with cloud data services and platforms across multi-vendor environments including data warehouses, data lakes and ML platforms
  • Platform Engineering Mindset – Ability to treat data and AI capabilities as platform products with clear service contracts and developer experience focus
Job Responsibility
Job Responsibility
  • Define and lead data and AI platform architecture strategy, establishing patterns that balance functional requirements with non-functional requirements including scalability, data quality, privacy and cost optimization
  • Work hand in hand with product engineering squads to establish data generation patterns, working directly with engineers to build data-informed products and AI capabilities
  • Drive architectural strategy for making data available across the organization through self-service data products, APIs and governed data access patterns
  • Lead the technical vision for AI capabilities distinguishing between machine learning for pattern recognition and agentic AI for autonomous task execution, establishing platform foundations for both
  • Champion data mesh principles where appropriate, enabling domain teams to own their data as products while maintaining consistency through federated governance
  • Establish MLOps practices and platforms that enable product teams to train, deploy and monitor machine learning models with the same velocity as application code
  • Collaborate with IT Systems and NaaS architects to ensure telemetry, network data and business systems generate high-quality data with proper lineage and governance
  • Work across multi-vendor environments including cloud data platforms, on-premise data systems and SaaS analytics tools to establish cohesive data architecture
  • Drive architectural governance for data and AI work ensuring solutions follow platform patterns for data quality, model governance and AI safety
  • Provide technical thought leadership on emerging data and AI technologies, evaluating applicability and translating possibilities into roadmaps aligned with BT International's platform strategy
What we offer
What we offer
  • Cafeteria package - HUF 600,000/ year
  • Performance-based bonus
  • Comprehensive private health care package for all the employees, which can be extended to family members
  • Nursery support for mothers returning from maternity
  • Extended paternity leave: 10+10 day fully paid days
  • Commuting allowance
  • Home office allowance
  • Employee discount opportunities
  • Highly affordable mobile packages for the family as well
  • New high-class offices both in Budapest and Debrecen
  • Fulltime
Read More
Arrow Right

Principal Software Engineer

Are you passionate about architecting distributed systems, building high-perform...
Location
Location
United States , Multiple Locations
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
  • Expertise in distributed consensus, partitioning, replication, and cloud-native networking
  • Proficiency in C, C++, Rust, Golang, or similar systems programming languages
  • Linux networking expertise: kernel networking stack, packet processing (DPDK/eBPF/XDP), NIC offloads, TCP/UDP performance tuning, and observability tools applied to high‑throughput, low‑latency data paths
  • Experience with DNS protocol, large-scale web applications, or cloud infrastructure is a plus
  • Experience applying AI/Machine Learning (ML) techniques for operational excellence, such as predictive analytics, automated incident detection, or self-healing infrastructure
  • 6+ years of experience designing and building distributed systems or networking data paths at scale
Job Responsibility
Job Responsibility
  • Architect and implement distributed systems and networking data paths for cloud-scale Networking services, focusing on reliability, performance, security, and operational excellence
  • Lead innovation in data plane engineering, including traffic routing, failover and self-healing mechanisms
  • Drive adoption of advanced distributed algorithms, networking protocols, and AI-driven solutions to optimize scalability and resilience
  • Mentor and guide engineers in best practices for distributed systems, networking, security, and cloud infrastructure, providing technical leadership through rigorous code and design reviews
  • Collaborate cross-functionally to deliver end-to-end solutions, from design through deployment and operations
  • Champion operational excellence by developing robust monitoring, observability, and automated recovery solutions, including AI-powered incident detection and predictive scaling
  • Embody our Culture and Values
  • Fulltime
Read More
Arrow Right

Business Systems Architect Principal

The Business Systems Architect Principal is central to BT International’s transf...
Location
Location
Hungary , Budapest
Salary
Salary:
Not provided
plus.net Logo
Plusnet
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strategic Architecture Leadership – Proven ability to define and communicate architectural vision for complex business systems landscapes, with track record driving large-scale transformation programmes in regulated telco environments
  • Business Systems Domain Expertise – Deep understanding of BAU systems (sales, service, enterprise applications) and service operations platforms (service desk, AI Ops, lead-to-cash, billing, inventory) with knowledge of how these capabilities support business operations and cost optimisation
  • Systems Integration – Extensive experience linking together various IT systems, services and software across BAU applications, service operations platforms and NaaS capabilities to enable functional operation and support business processes
  • Vendor and Stakeholder Management – Strong ability to work with SaaS platforms and enterprise vendors as strategic partners, negotiate technical implementations and manage complex stakeholder relationships across business and operational teams
  • Modernisation Patterns – Expert knowledge of incremental modernisation approaches including service operations automation, AI-driven process optimisation and cloud-native patterns that balance operational continuity with transformation
  • Technical Depth – Hands-on background in business systems integration and service operations with coding capability, enabling credibility with engineering teams and active participation in technical spikes when needed
  • Operational Excellence – Understanding of comprehensive observability, service resilience and operational automation approaches including metrics, logging, distributed tracing and telemetry pipelines for business systems
  • NaaS Integration Understanding – Knowledge of how BAU systems integrate with network-as-a-service models, enabling sales and service operations to leverage aaS capabilities whilst maintaining operational stability
  • Leadership and Influence – Ability to lead blended IT teams (support, maintenance, BAU systems, service operations) through transformation, build consensus across organisational boundaries and develop technical leadership capability in operational functions
  • Extensive experience leading IT systems and BSS architecture in telecommunications or complex B2B environments, with demonstrated success modernizing legacy landscapes across multiple system domains
Job Responsibility
Job Responsibility
  • Define and lead the architectural strategy for business systems across BAU applications (sales, service, enterprise) and service operations (AI Ops, Service Desk, L2C including Pricing/Design/Quoting/SRM, billing, inventory), establishing target state architecture that optimises legacy systems whilst designing future-ready capabilities
  • Own the business systems portfolio optimisation roadmap, making strategic keep/modernise/retire decisions for BAU systems and service operations platforms based on technical fitness, cost-effectiveness and alignment with asset-light, NaaS-based operating model
  • Establish systems integration approaches that link together BAU applications, service operations platforms and NaaS capabilities, enabling functional operation across sales, service and enterprise systems whilst supporting business processes
  • Lead vendor strategy for business systems platforms including SaaS applications, enterprise systems and service operations tools, negotiating strategic partnerships that simplify IT landscape whilst aligning with aaS business needs and reducing vendor dependencies
  • Champion modern architecture patterns including service operations automation, AI-driven process optimisation, billing automation, inventory accuracy improvement and self-service capabilities that reduce manual propensity and improve cost-to-serve metrics
  • Design for operational excellence by establishing comprehensive observability across business systems including metrics, logging, distributed tracing and telemetry that enable proactive issue detection and support continuous improvement in service delivery
  • Collaborate with Data and AI architects to leverage data platforms and AI capabilities for business intelligence, service automation and customer insights, ensuring business systems generate valuable data and support AI-driven process improvements
  • Drive architectural governance through design reviews and architecture conformance processes, ensuring business systems initiatives align with enterprise standards, security requirements and support transformation to asset-light operating model
  • Build and mentor Business Systems architects who work with BAU operations and service delivery teams, establishing technical leadership capability and fostering architectural thinking across support, maintenance and enterprise systems functions
  • Work with engineering leadership to establish integration patterns that connect business systems with NaaS capabilities, enabling sales and service operations to leverage network-as-a-service whilst maintaining operational stability and cost-effectiveness
What we offer
What we offer
  • Cafeteria package - HUF 600,000/ year
  • Performance-based bonus
  • Comprehensive private health care package for all the employees, which can be extended to family members
  • Nursery support for mothers returning from maternity
  • Extended paternity leave: 10+10 day fully paid days
  • Commuting allowance
  • Home office allowance
  • Employee discount opportunities
  • Highly affordable mobile packages for the family as well
  • New high-class offices both in Budapest and Debrecen
  • Fulltime
Read More
Arrow Right