This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
A unique opportunity to join Bing Search, a global search engine powering billions of searches daily, both from humans and from Large-Language Models. The Bing Metrics team is looking for passionate data scientists to work on the new generation of metrics and quality control for the Bing Grounding API. The team ensures that Bing returns high-quality, error-free, and authoritative results using a variety of different approaches. Our team builds complex pipeline including crowd judging and machine learning steps to verify our suspicions. Now, we actively use LLMs like ChatGPT as a judge to evaluate the quality of search results at multiple levels: query, answer, whole page and generate insights for the teams who are responsible this experience. As a part of an international and distributed team you will be responsible for RAG quality metrics within Bing Search. The job provides you with the opportunity to work with multiple teams across entire Bing (>80 different teams) and greatly influence the search engine relevance and search result quality of the entire platform. We are an established core team in Bing with very high visibility and impact. We are looking for a talented engineer/DS with a passion to work with LLM and specifically RAG, design, implement and test complex data pipelines built on top of LLM models, create new tools for running multi-step prompts to evaluate search engine quality and generate actionable insights for the teams.
Job Responsibility:
Design and implement metrics for RAG with Bing Web Search and other APIs
Build pipelines and dashboards for Bing Grounding quality
Use LLM models in LLM-as-a-judge settings for data evaluation
Engineer prompts for textual and multi-model LLMs for data processing and generation of insights
Design and implement E2E pipelines (from sampling anomalies from the logs through prompt engineering to ultimately automatically updatable dashboards)
Apply classical ML (feature engineering + model training, text and image embeddings) along with LLM to augment data analysis and processing pipelines
Help teams to build new innovative search experience with Bing
Requirements:
Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 1+ year(s) data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results)
OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 3+ years data-science experience
OR Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 5+ years data-science experience
OR equivalent experience
3 years of T-SQL
Experience or deep interest in RAG, Large-Language Models
Passion for metrics building for complex multi-step systems
Passion for prompt engineering and text generations with LLMs
Interest in designing dashboards for data visualization in novel ways
Experience with Machine Learning
5+ years of C#, Python, Java or any other major programming language
3 years of SQL
Experience with Large Language Models
Ability to work independently, solid collaboration and communication skills
Nice to have:
Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 3+ years data-science experience
OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 5+ years data-science experience
OR Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 7+ years data-science experience