This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are investing massively in developing next-generation AI tools for multimodal datasets and a wide range of applications. We are building large scale, enterprise grade solutions and serving these innovations to our clients and WPP agency partners. As a member of our team, you will work alongside world-class talent in an environment that not only fosters innovation but also personal growth. You will be responsible for shaping how our AI models understand and interpret complex creative content. Your work will bridge the gap between raw AI capabilities and the delivery of accurate, reliable, and valuable insights for our FTSE 100 clients. You will be at the forefront of applied AI, ensuring our products are not just powerful, but also precise and trustworthy.
Job Responsibility:
Collaboration: Work closely with product managers, data scientists, and architects to translate business needs into technical requirements for AI evaluation and application
Prompt Engineering & LLM Application: Design, develop, and iteratively refine sophisticated prompts for Large Language Models (LLMs)
AI Output Evaluation & Governance: Design and implement evaluation frameworks to ensure our LLM-based services deliver accurate, reliable outputs. Establish metrics for prompt performance, iterative improvement processes, and drift monitoring. Evaluate and recommend best-in-class evaluation tools and methodologies to enhance our capabilities
Evaluation Dataset Engineering: Build and maintain the infrastructure for high-quality evaluation datasets representing diverse industry verticals and creative types. This involves designing data pipelines, annotation workflows, quality control systems, and version control. You'll develop intelligent sampling strategies to ensure balanced positive/negative examples across questions about creative elements that drive advertising performance
Applied AI Research & Integration: Build a production-ready evaluation system for our LLM-based services, which extract structured insights from advertising creatives. In your first 6 months, you'll develop an automated evaluation system using LLM-as-judge approaches with human-in-the-loop validation, creating a robust, scalable solution for FTSE 100 clients
Documentation & Knowledge Sharing: Document the AI evaluation process and results to track product quality and maintain user-facing product and API documentation
Requirements:
Proficiency in Python
Hands-on experience in advanced prompt engineering for major LLMs
Proven experience in designing and implementing evaluation methodologies and quality frameworks for AI/ML model outputs
Familiarity with modern AI/LLM frameworks (e.g., LangChain, Google GenAI)
Experience working with both structured and unstructured data, particularly in a cloud environment (GCP, AWS, or Azure)
Strong analytical and problem-solving skills
Nice to have:
Experience with Computer Vision APIs or models
Experience building and deploying scalable API services (e.g., FastAPI, Flask)
Experience working in a product-driven environment and following MLOps best practice
Experience implementing and optimizing data transformations and ETL/ELT processes
What we offer:
enhanced pension
life assurance
income protection
private Healthcare
Remote working
Truly flexible working hours
Generous Leave - holiday plus bank holidays and enhanced family leave