This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
With Prisma AIRS, Palo Alto Networks is building the world's most comprehensive AI security platform. Organizations are increasingly building complex ecosystems of AI models, applications, and agents, creating dynamic new attack surfaces with risks that traditional security approaches cannot address. In response,Prisma AIRS delivers model security, posture management, AI red teaming, and runtime protection. Our customers can confidently deploy AI-driven innovation while ensuring a formidable security posture from development through runtime. As a Senior Principal Machine Learning Engineer, you will drive research on cutting-edge areas, including AI-Native Security (LLM, AI Agent, Model Supply-Chain, Runtime AI) and the broader LLM ecosystem security. You will leverage this research to identify and bring up new product opportunities. You will collaborate closely with engineering teams to deploy models, ensuring maximum product impact. Furthermore, you will foster cross-functional collaboration and serve as an AI thought leader both within the company and in the security/LLM community. Beyond individual contribution, you will lead complex technical projects, mentor senior engineers, and set the standard for performance, scalability, and engineering excellence across the organization. Your decisions will have a profound and lasting impact on our ability to deliver cutting-edge AI security solutions at a massive scale.
Job Responsibility:
Lead the architectural design of a highly scalable, low-latency, and resilient ML inference platform capable of serving a diverse range of models for real-time security applications
Define technical approaches to less-defined product requirements, ensuring the best fit between product features and technical implementation
Explore new product opportunities by maintaining a deep understanding of LLM and Generative AI research trends
Provide technical leadership and mentorship to the team, driving best practices in MLOps, software engineering, and system design
Drive the strategy for model and system performance, guiding research and implementation of advanced optimization techniques like custom kernels, hardware acceleration, and novel serving frameworks
Establish and enforce engineering standards for automated model deployment, robust monitoring, and operational excellence for all production ML systems
Act as a key technical liaison to other principal engineers, architects, and product leaders to shape the future of the Prisma AIRS platform and ensure end-to-end system cohesion
Tackle the most ambiguous and challenging technical problems in large-scale inference, from mitigating novel security threats to achieving unprecedented performance goals
Requirements:
BS/MS or Ph.D. in Computer Science, a related technical field, or equivalent practical experience
Extensive professional experience in software engineering with a deep focus on MLOps, ML systems, or productionizing machine learning models at scale
Expert-level programming skills in Python are required
Deep, hands-on experience designing and building large-scale distributed systems on a major cloud platform (GCP, AWS, Azure, or OCI)
Proven track record of leading the architecture of complex ML systems and MLOps pipelines using technologies like Kubernetes and Docker
Mastery of ML frameworks (TensorFlow, PyTorch) and extensive experience with advanced inference optimization tools (ONNX, TensorRT)
A strong understanding of popular model architectures (e.g., Transformers, CNNs, GNNs) is a must
Demonstrated expertise with modern LLM inference engines (e.g., vLLM, SGLang, TensorRT-LLM) is required
Nice to have:
Experience in a systems language like Go, Java, or C++ is nice to have
A deeper understanding of attention mechanisms and related knowledge is a plus
Open-source contributions in these areas are a significant plus
Experience with low-level performance optimization, such as custom CUDA kernel development or using Triton Language, is a plus
Experience with data infrastructure technologies (e.g., Kafka, Spark, Flink) is great to have
Familiarity with CI/CD pipelines and automation tools (e.g., Jenkins, GitLab CI, Tekton) is a plus