CrawlJobs Logo

Senior Software Engineer, AI Eval

sentry.io Logo

Sentry

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

240000.00 - 280000.00 USD / Year

Job Description:

As a Senior Software Engineer on Sentry’s AI/ML team, you’ll be responsible for building the evaluation infrastructure that measures the accuracy, reliability, and real-world performance of our AI systems. This role is critical to ensuring that our debugging agents and AI-powered features behave correctly, safely, and predictably as they scale. You’ll design datasets, benchmarks, and test harnesses that turn ambiguous AI behavior into measurable signals, helping the team ship AI with confidence.

Job Responsibility:

  • Design and build robust evaluation frameworks to measure accuracy, reliability, regressions, and edge cases in AI systems
  • Create and curate high-quality datasets, golden test cases, and benchmarks grounded in real production data
  • Build automated test harnesses and metrics pipelines to continuously evaluate models, prompts, and agentic workflows
  • Partner closely with applied AI engineers and product leaders to define what “good” looks like and translate it into measurable criteria
  • Own the evaluation lifecycle for major AI initiatives, from early experimentation through production monitoring

Requirements:

  • Minimum 5+ years of professional experience with a Bachelor’s degree in computer science, machine learning, or a related field
  • Experience building testing, evaluation, or data infrastructure for complex systems (AI/ML experience strongly preferred)
  • Comfort writing production-quality code (we use Python and TypeScript)
  • Experience working with structured and unstructured datasets, labeling workflows, or data quality pipelines
  • Familiarity with modern ML systems and evaluation techniques (e.g., offline metrics, online evaluation, regression testing for models or prompts)

Nice to have:

Bonus: experience evaluating LLMs, agentic systems, or AI-assisted developer tools

What we offer:
  • incentive compensation
  • equity grants
  • paid time off
  • group health insurance coverage

Additional Information:

Job Posted:
January 22, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Software Engineer, AI Eval

Senior Software Engineer - Studio - Java, AI

As a Senior Software Engineer, you’ll build the backend that powers AI features ...
Location
Location
United States , New York
Salary
Salary:
175000.00 - 240000.00 USD / Year
clearstreet.io Logo
Clear Street
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 7+ years of strong proficiency in enterprise Java
  • Experience designing and deploying AI/ML or LLM-backed systems in production
  • Familiarity with LLM tooling and patterns: (e.g. tool calling, RAG pipelines and knowledge bases, evals, cost/latency tradeoffs, basic red-teaming)
  • Experience in supporting and running systems in a production environment
  • Comfortable working in a dynamic environment, partnering with cross-functional teams, and moving from prototype to reliable production
Job Responsibility
Job Responsibility
  • Design, implement, and productionize reliable AI workflows to augment the Studio trading platform
  • Build tooling to monitor, tune, and evaluate models and workflows, as well as applicable guardrails to ensure outputs meet quality and regulatory requirements
  • Collaborate with technical and non-technical teams across the firm to identify high ROI AI opportunities
  • Build rapid prototypes and translate them into production-grade systems. Utilize the latest AI-powered development tools to iterate quickly
  • Create reusable libraries, SDKs and tooling to enable AI development throughout the firm
  • Stay current on the latest in applied AI. Read papers, evaluate new models, test out new tools
  • Participate in code review and architecture design, manage deployments, and support and contribute to the success of the overall Studio platform
What we offer
What we offer
  • Competitive compensation, benefits, and perks
  • Company equity
  • 401k matching
  • Gender neutral parental leave
  • Full medical, dental and vision insurance
  • Lunch stipends
  • Fully stocked kitchens
  • Happy hours
  • Fulltime
Read More
Arrow Right

Senior Platform Engineer, AI Evaluation

We’re looking for an AI Platform Engineer to evolve and extend our internal eval...
Location
Location
United States , Mountain View
Salary
Salary:
137871.00 - 172339.00 USD / Year
khanacademy.org Logo
Khan Academy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field
  • 5 years of Software Engineering experience with 2+ of those years working on the evaluation of generative AI systems
  • Strong programming skills in Go, Python, SQL, and at least one data pipeline framework (e.g., Airflow, Dagster, Prefect)
  • Familiarity with the architecture of large language models and their industry-standard APIs
Job Responsibility
Job Responsibility
  • Evolve and extend our internal evaluation framework for assessing the quality of our AI-driven experiences
  • Work closely with ML data engineers and platform developers to help internal teams adopt an eval-driven development process incorporating offline benchmark tests and online experiments
  • Gather internal requirements, getting buy-in for changes, and then developing documentation and training materials
What we offer
What we offer
  • Competitive salaries
  • Ample paid time off as needed
  • 8 pre-scheduled Wellness Days in 2026
  • Remote-first culture
  • Generous parental leave
  • 401(k) + 4% matching
  • Comprehensive insurance, including medical, dental, vision, and life
  • Fulltime
Read More
Arrow Right

Senior AI Frontend Engineer (Developer Productivity)

We're seeking a Senior Frontend Engineer with a strong React/TypeScript backgrou...
Location
Location
United Kingdom , Belfast
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong expertise (5–10+ years) building modern frontend applications with React and TypeScript
  • Proficiency in JavaScript, React (or another UI framework), and TypeScript
  • Experience with state management libraries (redux, context API, zustand), for building wellstructured applications
  • Experience with storybook or componentised development
  • Proficiency in implementing streaming and real-time experiences (e.g., word/token streaming, live updates, progress/status indicators)
  • Strong understanding of frontend architectures, state management, performance optimisation, and responsive design
  • Hands-on experience with any tools like LangChain / LangGraph / Vercel AI SDK / Google ADK (Agent Development Kit)
  • Familiarity with CI/CD tools (e.g.: Jenkins, Tekton, ArgoCD, Harness, etc)
Job Responsibility
Job Responsibility
  • Own the user-facing layer of our nextgeneration Developer Productivity platform @ Citi, transforming complex AI capabilities - from chat interfaces to rich data visualizations - into intuitive, trustworthy experiences
  • Collaborate closely with other AI, Software Engineers and the Product team to leverage bleeding-edge Generative AI
  • Challenge, change, modernise & enhance the experience of our 50,000 engineers globally throughout Citi's SDLC (Software Development Life Cycle)
  • Release to production a small new or enhanced AI-first user interface that will have positively impacted the lives of thousands of Software Engineers and Business Analysts working in Software Requirements Engineering
  • Start raising the bar in our React.JS codebase introducing better componentisation, testing, storybook
  • Establish network of UI engineers across the organisation to contribute and learn about best practice
  • Get buy in from the team on architectural principles, ways of working and system requirements
  • Own and champion the implementation of best practices for interaction design within the team, establishing clear guidelines for AI-specific UX patterns
  • Mentor junior engineers on best practices for designing and implementing AI-driven user interfaces
  • Design & implement production-grade features for AI solutions
What we offer
What we offer
  • 27 days annual leave (plus bank holidays)
  • A discretional annual performance related bonus
  • Private Medical Care & Life Insurance
  • Employee Assistance Program
  • Pension Plan
  • Paid Parental Leave
  • Special discounts for employees, family, and friends
  • Access to an array of learning and development resources
  • Fulltime
Read More
Arrow Right
New

Lead Software Engineer - full stack (java/.net) + AI/GenAI/Agentic AI

Wells Fargo is seeking a Lead Software Engineer.
Location
Location
India , BENGALURU
Salary
Salary:
Not provided
https://www.wellsfargo.com/ Logo
Wells Fargo
Expiration Date
February 27, 2026
Flip Icon
Requirements
Requirements
  • 5+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 4+ years of hands-on experience in full stack Software (Java or .Net) Enterprise Application component design and development.
  • 2+ years of AI experience in test or eval driven development including data and error analysis ensuring robust and scalable AI software.
  • Experience architecting and implementing agentic frameworks for autonomous multi-step reasoning and planning
  • Solid grasp of parsing, chunking, indexing and re-ranking of multiple file formats
  • Experience with Generative AI Operations, and enterprise-scale AI adoption strategies.
  • Experience in enterprise AI model lifecycle management, AI compliance, and risk mitigation strategies.
  • Strong understanding of human centered AI design for workplace applications.
  • Experience with working with globally distributed teams in working in Agile scrums
Job Responsibility
Job Responsibility
  • Lead complex technology initiatives including those that are companywide with broad impact
  • Act as a key participant in developing standards and companywide best practices for engineering complex and large scale technology solutions for technology engineering disciplines
  • Design, code, test, debug, and document for projects and programs
  • Review and analyze complex, large-scale technology solutions for tactical and strategic business objectives, enterprise technological environment, and technical challenges that require in-depth evaluation of multiple factors, including intangibles or unprecedented technical factors
  • Make decisions in developing standard and companywide best practices for engineering and technology solutions requiring understanding of industry best practices and new technologies, influencing and leading technology team to meet deliverables and drive new initiatives
  • Collaborate and consult with key technical experts, senior technology team, and external industry groups to resolve complex technical issues and achieve goals
  • Lead projects, teams, or serve as a peer mentor
  • Must be able to lead a fast-track team to turn around advanced automation and AIML solutions
  • Should be able to navigate the enterprise and rally the teams to achieve deliverables
  • Should be able to co-ordinate with peers and stakeholders to identify forward looking solutions.
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Copilot Security

Copilot Security is at the core of Microsoft’s mission to deliver trusted, human...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • 3+ years in technical engineering roles building large-scale services.
  • Hands-on experience designing and operating security-critical or AI-powered systems at scale, including agentic AI, secure orchestration, or advanced threat defenses.
  • Proven ability to design, build, and ship agentic AI features or frameworks.
  • Ability to clearly explain complex systems and security concepts to technical and non-technical stakeholders and influence cross-org roadmaps.
  • Agentic AI Development & Orchestration: Experience building production agent systems using frameworks such as LangGraph, Amazon Strands SDK, or similar platforms
  • familiarity with agentic design patterns including tool calling, multi-agent coordination, and secure delegation patterns.
  • Hands-on experience with distributed training frameworks (Ray, Slurm, HPC), containerization and orchestration technologies (Docker, Kubernetes) for ML model deployment, and ML lifecycle management in production environments.
  • Experience designing evaluation frameworks for LLM-based applications and implementing observability for agent systems using tools such as Phoenix, MLFlow, LangFuse, or custom eval harnesses
  • understanding of AI safety evaluation methodologies including adversarial testing and red-teaming.
Job Responsibility
Job Responsibility
  • Develop and ship agentic AI-powered security features that protect users from threats such as prompt injection, adversarial manipulation, and abuse of agentic workflows.
  • Implement secure orchestration frameworks that enable Copilot to safely delegate, coordinate, and execute actions across devices, services, and platforms.
  • Invent and apply new intelligent agents that leverage information flow analysis and apply common sense and judgement guardrails for security and privacy.
  • Collaborate with product, engineering, security, privacy, and AI teams to adopt agentic security patterns and best practices across Copilot and MAI.
  • Monitor key metrics for agentic AI security and innovation, using data-driven insights to improve defenses and enablement.
  • Document secure agentic AI patterns, ensuring they address novel risks, support safe delegation, and enable responsible orchestration of actions.
  • Fulltime
Read More
Arrow Right

Senior AI Engineer

Microsoft’s Azure Data engineering team is leading the transformation of analyti...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Build and evolve the Real-Time Intelligence evaluations platform: implement offline and online eval pipelines, including golden datasets, human review workflows, and LLM-as-judge / auto-raters for agents, anomaly detectors, and decisioning systems
  • Instrument agentic solutions for observability by wiring up telemetry, tracing, structured logging, and dashboards so quality, safety, latency, and cost are easy to monitor and debug
  • Integrate evals into the development lifecycle by connecting pipelines to CI/CD, canary and A/B experiments, and phased rollouts, making it simple for partner teams to run and interpret evaluations
  • Collaborate and mentor across product, research, and engineering teams, sharing best practices on eval design, LLM-as-judge usage, and Responsible AI, and providing code reviews and guidance that raise the bar for the AI features
  • Fulltime
Read More
Arrow Right
New

Senior Research Engineer

As a Senior Research Engineer at Microsoft, you will advance Microsoft’s mission...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Experience deploying Fine Tuned LLMs or multimodal models in live production environments
  • Experience shipping and maintaining production AI systems
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Bringing State-of-the-Art Research to Products
  • Design and implement AI systems using foundation models, prompt engineering, retrieval-augmented generation, multi-agent architectures, and classic ML
  • Fine-tune large language models on domain-specific data and evaluate via offline and online methods such as A/B testing, telemetry, and shadow deployments
  • Build and harden prototypes into production-ready services using robust software engineering and MLOps practices
  • Drive original research and thought leadership (whitepapers, internal notes, patents)
  • convert insights into shipped capabilities
  • Research Translation: Continuously review emerging work
  • identify high-potential methods and adapt them to Microsoft problem spaces
  • End-to-End System Development
  • ML Design & Architecture: Own end-to-end pipeline from data prep, training, evaluation, deployment, and feedback loops
  • Fulltime
Read More
Arrow Right

Senior AI Engineer - Teams Messaging AI

Are you interested in joining one of the most exciting teams and working on the ...
Location
Location
United States , Mountain View
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Design, implementation, and shipping of multiple new messaging and large language models (LLM) agentic features
  • Building end-to-end user experiences that work across multiple devices and browsers
  • Writing and maintaining unit tests, large language models (LLM) eval and automated integration or end-to-end tests
  • Building web and AI applications in enterprise and/or consumer markets
  • Collaborating with partner teams to meet engineering goals
  • Managing individual projects or feature priorities, deadlines, and deliverables
  • Fulltime
Read More
Arrow Right