CrawlJobs Logo

QA LLM Engineer

talentica.com Logo

Talentica

Location Icon

Location:

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

A QA Automation Engineer with strong experience in LLMs and GenAI who can ensure the accuracy, stability, and performance of AI-driven applications.

Job Responsibility:

  • Design and execute QA strategies for LLM-based and search-driven products
  • Validate data pipelines involving indexing, chunking, embeddings, cosine similarity and keyword search
  • Evaluate retrieval-augmented generation (RAG) and recommendation system quality using precision, recall, and relevance metrics
  • Develop prompt test suites to measure accuracy, consistency, and bias
  • Monitor LLM observability metrics such as latency, token usage, hallucination rate and cost performance
  • Automate end-to-end test scenarios using Playwright and integrate with CI/CD pipelines
  • Collaborate with ML engineers and developers to improve model responses and user experience
  • Contribute to test frameworks and datasets for LLM regression and benchmark testing

Requirements:

  • BE/BTech in Computer Science, Data Engineering, or a related field from a top institute (like IIT, NIT, BITS, etc.)
  • 3.5 to 5.5 years of experience in QA engineering
  • At least 1+ years of experience in GenAI or LLM-based systems
  • Strong understanding of indexing, chunking, embeddings, similarity search, and retrieval workflows
  • Experience with prompt engineering, LLM evaluation, and output validation techniques
  • Proficiency with Playwright, API automation, and modern QA frameworks
  • Knowledge of observability tools for LLMs
  • Solid scripting experience in Python
  • Knowledge of different LLM providers (OpenAI, Gemini, Anthropic, Mistral, etc.)
  • Exposure to RAG pipelines, recommendation systems, or model performance benchmarking
  • Strong analytical and debugging skills, with a detail-oriented mindset
What we offer:
  • A culture of innovation
  • Endless learning opportunities
  • Talented peers
  • Work-life balance
  • Flexible schedules
  • Remote work options
  • A great culture
  • Recognition & rewards

Additional Information:

Job Posted:
January 02, 2026

Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for QA LLM Engineer

Full Stack Developer

Our Software Development team is looking for a Full Stack Engineer to ensure qua...
Location
Location
Serbia , Belgrade; Nis
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of combined experience in full-stack development, quality engineering or related fields
  • Experience with Python-based testing (PyTest, Playwright, etc.)
  • Experience with GitHub Actions and CI/CD
  • Familiarity with LLM-based applications and performance monitoring tools
  • Experience with containerization and orchestration (Docker, Kubernetes)
  • Excellent communication skills and customer empathy, with a focus on improving user satisfaction and trust
  • Bachelor’s degree in Computer Science, QA Engineering, Computer Engineering, or a related field
Job Responsibility
Job Responsibility
  • Act as the primary support contact for internal users, triaging issues and gathering feedback for engineering teams
  • Develop and maintain automated testing frameworks to validate new code, ensure uptime, and prevent regressions in AI-generated outputs
  • Create and execute evaluation pipelines to monitor response quality and relevance from LLM-based systems
  • Partner with developers to build robust CI/CD and monitoring tools for production stability
  • Help document and continuously improve incident response and release validation processes
Read More
Arrow Right

Senior Software Developer

Our client is looking for a Senior Software Developer for a 5 month contract in ...
Location
Location
Canada , North York
Salary
Salary:
Not provided
https://www.randstad.com Logo
Randstad
Expiration Date
January 29, 2026
Flip Icon
Requirements
Requirements
  • 7+ years hands-on Java development in an enterprise environment, including Spring Boot, REST API design, integration patterns, and production support / incident management
  • Strong SQL and data handling expertise: capable of analyzing schemas, building optimized queries, integrating APIs with data stores, and enforcing data quality in service logic
  • Proven experience supporting applications in production: triaging defects, analyzing incident root cause, applying hotfixes, improving resiliency and performance
  • Ability to consume and operationalize AI services: call LLM endpoints, handle prompt/response patterns, enforce guardrails, and log usage safely
  • Practical understanding of core ML / LLM concepts (supervised vs unsupervised learning, prompt engineering, retrieval, drift) sufficient to collaborate with data/AI teams and ship AI-enabled features
  • Comfort working in a secure, governed environment (privacy, PII protection, access control, auditability)
  • Strong Java and Spring Boot experience building enterprise services at scale (API design, dependency management, error handling, observability, performance tuning)
  • Advanced SQL fluency (Oracle, MySQL, PostgreSQL) — complex joins, window functions, data validation, and query optimization
  • Working knowledge of data modeling, ETL/ELT pipelines, and API-driven data integration
  • Hands-on experience with Git, automated testing, secure coding practices, code reviews, and CI/CD pipelines
Job Responsibility
Job Responsibility
  • Design, build, and maintain secure, scalable Java services and APIs using Spring Boot
  • Translate technical requirements into production-grade application code, integration logic, and robust data access layers
  • Write clean, testable Java (unit, integration, regression), contribute to CI/CD pipelines, and support automated deployments
  • Design, build, and optimize data workflows – including SQL queries, ETL logic, and caching for reliability, integrity, and performance in production
  • Collaborate with data engineers and analysts to ensure service-layer alignment with enterprise data models and reporting needs
  • Diagnose and resolve production issues (performance, defects, incidents)
  • participate in on-call / support rotations as needed
  • Review code, enforce engineering standards, document solutions, and mentor intermediate developers
  • Collaborate with architects, QA, product owners, and business SMEs in an iterative / Agile delivery model to plan, scope, and land increments
  • Apply AI/ML capabilities (LLMs, retrieval-augmented generation, classic ML models) to enhance existing Java services where appropriate
What we offer
What we offer
  • Earn a competitive rate within the industry
  • Potential for extension
  • Fulltime
Read More
Arrow Right

Senior SEO Manager

WeRoad is entering a new phase of ambitious international growth, and we’re look...
Location
Location
Italy , Milan
Salary
Salary:
50000.00 - 60000.00 EUR / Year
weroad.it Logo
WeRoad
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years in SEO with strong technical + strategic chops
  • proven delivery in international/internazionalization contexts
  • Strong technical SEO with hands-on track record across: rendering & crawlability, schema/structured data, hreflang, migrations, logs
  • confident with testing frameworks and QA
  • Content architecture chops (pillar/cluster) and comfort distributing to LLM-magnet platforms (e.g., Reddit, Quora, etc.)
  • Project/program management rigor
  • crisp communication with engineers, PMs and stakeholders
  • Data literacy: GSC/GA4/Looker
  • ability to define KPIs beyond “rankings”
Job Responsibility
Job Responsibility
  • Build and execute a market-by-market SEO plan (site + blogs) aligned to seasonality, intent, and commercial goals
  • Lead GEO/AEO experiments: structured summaries, comparison tables, answer-ready snippets, entity-rich pages
  • validate with prompt-set testing
  • Operationalize LLM seeding: repurpose content to LLM-friendly formats and trusted 3rd-party sources
  • drive Reddit/Quora participation guidelines
  • Own technical backlog with Engineering (such ash schema, rendering, CWVs, redirects, migrations from subdomains)
  • Partner with Content Ops team and Website Content team to ensure copy + media (images/video from DAM) are AI-parsable: captions, alt-text, filenames, VideoObject markup
  • Coach stakeholders on GEO/LLM and report progress to leadership with clear business impact
What we offer
What we offer
  • A free WeRoad trip every year – Choose your dream destination with €1,200/year travel discount or travel even more as a Travel Coordinator
  • A buzzing workspace – Join our vibrant co-working space and community
  • Opportunities to learn – We invest in your growth with training and learning opportunities
  • Unlimited holiday – You are the master of your own time. We also have a tradition that everyone takes at least a half-day off on their birthday
  • Extra benefits – Meal vouchers (€8/day – when applicable), a €1,000 bonus for your wedding or civil union, and one-off bonus when you hit 5 years with us
  • New parent support – €3,600/year for 3 years after birth or adoption (or prorated if your child is under 3 when you start at WeRoad)
  • Fun – Team events, Travel Coordinator meet-ups, and unforgettable moments
  • Hybrid working – Flexible working, and we support you to work from your destination of choice up to 1 month per year
  • Fulltime
Read More
Arrow Right

Product Manager - AI/ML

We are seeking an experienced Technical Product Manager – AI/ML to lead the defi...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
evoluteiq.com Logo
EvoluteIQ
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years of overall experience
  • Several years in technical/engineering roles (software development, data engineering, or AI/ML)
  • 7+ years in product management
  • Strong understanding of Predictive AI/ML (classification, regression, anomaly detection)
  • Expertise in Natural Language Processing (LLMs, embeddings, conversational AI, text analytics)
  • Experience with Time Series Modeling (forecasting, demand planning, anomaly detection)
  • Knowledge of Generative AI (LLM-based copilots, text-to-X products, prompt engineering, RAG pipelines)
  • Hands-on familiarity with Python/Java, ML frameworks (TensorFlow, PyTorch), and cloud services (AWS, Azure, GCP)
  • Proven track record in building enterprise SaaS products and leading technical product discussions
  • Excellent communication and stakeholder management skills
Job Responsibility
Job Responsibility
  • Define and own the product roadmap for AI/ML features across predictive, NLP, time series, and generative AI domains
  • Align AI product strategy with overall platform vision, market trends, and customer needs
  • Identify opportunities for embedding AI capabilities into low-code/no-code workflows
  • Participate in technical design reviews with engineering and data science teams
  • Define API contracts, integration patterns, and deployment considerations for AI/ML features
  • Ensure product features are technically feasible, scalable, and aligned with enterprise architecture principles
  • Act as a bridge between technical and non-technical stakeholders, ensuring clarity of vision and execution
  • Collaborate with engineering, data science, and design teams to deliver scalable AI features
  • Define clear product specifications, use cases, and success metrics
  • Ensure compliance with security, data governance, and responsible AI principles
What we offer
What we offer
  • Opportunity to shape the strategy of a next-gen hyper-automation platform
  • Work with a cross-disciplinary team in a fast-growing, innovation-driven environment
  • Competitive compensation and growth opportunities
  • A culture of innovation, ownership, and continuous learning
  • Fulltime
Read More
Arrow Right
New

Data Science: Team Lead

The team is focused on building and evolving products related to news content pr...
Location
Location
Georgia , Tbilisi
Salary
Salary:
Not provided
tradingview.com Logo
TradingView
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2+ years of experience in managing technical teams with the ability to organize workflows and build effective processes
  • Deep understanding of the ML project lifecycle: from idea and prototype to production and maintenance
  • Strong knowledge of NLP/LLM technologies: text generation and classification, embeddings, RAG, and other modern techniques
  • Excellent communication skills and experience working with various teams (ML, backend, QA, product, analytics)
  • Ability to define and maintain roadmaps and make system-level engineering decisions
  • Experience in prioritization, risk assessment, and managing technical debt
  • Proficiency in Python and modern development tools (Git, CI/CD, Docker, Kubernetes)
  • Experience in operating ML systems in production (monitoring, metrics, A/B testing, incident handling)
Job Responsibility
Job Responsibility
  • Develop and enhance projects related to news processing (sentiment analysis, NER, classification, search, etc.)
  • Perform data analysis and preprocessing, prepare datasets, and build model pipelines
  • Design monitoring systems and evaluate the performance of ML systems
  • Lead a team of ML engineers working on NLP and LLM projects (news, content generation, recommendations, search, and chat systems)
  • Set tasks, prioritize work, manage deadlines, and ensure timely delivery
  • Collaborate with product and analytics teams to align goals and approaches
  • Support the technical growth of the team through mentoring, reviewing solutions, and assisting in system design
  • Improve development and deployment processes for ML solutions in production
  • Contribute to engineering efforts as a senior developer: design and implement key components, perform code reviews, and drive technical improvements
What we offer
What we offer
  • Flexible Working Hours
  • Hybrid Work Policy
  • Relocation Package
  • Private Health Insurance
  • Performance Bonus
  • Work alongside experienced professionals and mentors offering ongoing training and growth opportunities
  • Premium TradingView Subscription
  • Annual Team Events
  • A comfortable, well-equipped workspace with exclusive perks like a gym and much more
  • Fulltime
Read More
Arrow Right

Sr. Software Development Engineer

You will safeguard the quality of our AI and GenAI features by evaluating model ...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
highspot.com Logo
Highspot
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience as a Software Development Engineer in AI/ML systems
  • Strong coding skills in Python (evaluation pipelines, data processing, metrics computation)
  • Hands-on experience with evaluation frameworks (Ragas or equivalent)
  • Knowledge of vector embeddings, similarity search, and RAG evaluation
  • Familiarity with evaluation metrics (precision, recall, F1, relevance, hallucination detection)
  • Understanding of LLM-as-a-judge evaluation approaches
  • Strong analytical and problem-solving skills
  • ability to combine human judgment with automated evaluations
  • Bachelor’s or Master’s degree in Computer Science, Data Science, or related field
  • Strong English written and verbal communication skills
Job Responsibility
Job Responsibility
  • Evaluation Frameworks – Develop reusable, automated evaluation pipelines using frameworks such as Raagas
  • integrate LLM-as-a-judge methods for scalable assessments
  • Golden Datasets – Build and maintain high-quality benchmark datasets in collaboration with subject matter experts
  • AI Output Validation – Evaluate results across text, documents, audio, and video, using both automated metrics and human-in-the-loop judgment
  • Metric Evaluation – Implement and track metrics such as precision, recall, F1 score, relevance scoring, and hallucination penalties
  • RAG & Embeddings – Design and evaluate retrieval-augmented generation (RAG) pipelines, vector embedding similarity, and semantic search quality
  • Error & Bias Analysis – Investigate recurring errors, biases, and inconsistencies in model outputs
  • propose solutions
  • Framework & Tooling Development – Build tools that enable large-scale model evaluation across hundreds of AI agents
  • Cross-Functional Collaboration – Partner with ML engineers, product managers, and QA peers to integrate evaluation frameworks into product pipelines
  • Fulltime
Read More
Arrow Right
New

Staff Software Engineer, AI

GoodLeap’s Business Solutions Business Unit is redefining how installers and hom...
Location
Location
United States , Roseville; Austin; Lehi; Plano; San Mateo; West Palm Beach
Salary
Salary:
173000.00 - 200000.00 USD / Year
goodleap.com Logo
GoodLeap
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in backend development
  • At least 2+ years working with AI/ML solutions or LLM
  • Experience working with vector databases, embeddings, and semantic search
  • Familiarity with MLOps, CI/CD for AI pipelines, and AI observability tools
  • Strong experience in Node.js, TypeScript, GraphQL, and REST APIs
  • Deep familiarity with AWS architecture — especially Lambda, ECS, S3, DynamoDB, API Gateway, and Step Functions
  • Experience building and integrating LLM features (e.g., via OpenAI, Claude, Vertex AI, or similar), including prompt design, vector storage, and retrieval strategies
  • Fluency in system design principles, scalability, reliability, fault-tolerance
  • Ability to drive clarity and make architectural tradeoffs, balancing idealism with pragmatism
  • Strong communication and collaboration skills, able to work effectively across product and engineering orgs
Job Responsibility
Job Responsibility
  • Design and lead backend architecture that supports AI/ML-powered features across mobile and API surfaces
  • Own end-to-end technical strategy for embedding LLMs, embedding stores, and personalized content delivery
  • Partner with product and design to scope features, validate feasibility, and ensure execution aligns with business impact
  • Build tools and services to help other team members experiment and ship AI-enhanced features responsibly and efficiently
  • Influence engineering standards and promote excellence in observability, performance, and security
  • Mentor engineers across teams, helping them level up in areas of backend architecture, AI integration, and delivery quality
  • Collaborate with cross-functional partners across QA, Mobile, Data Science, Product, and Marketing
What we offer
What we offer
  • Opportunities for growth and advancement within the company
  • May be eligible for a bonus
  • Fulltime
Read More
Arrow Right
New

AI Automation Engineer

As the AI Automation Engineer in the AI CoE, you will be instrumental in standin...
Location
Location
United States of America , Lincolnshire, Illinois
Salary
Salary:
120000.00 - 170000.00 USD / Year
Alight Solutions
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience in test automation, software engineering, or quality assurance (AI/ML experience is a plus, but not required)
  • Proficiency in Python, JavaScript/Node.js, or similar programming languages
  • Familiarity with CI/CD, version control, and cloud-based deployment environments
  • Experience with test automation tools (promptfoo, Ragas, Postman)
  • Understanding of AI model evaluation metrics (faithfulness, answer relevancy, hallucination, refusal correctness)
  • Basic knowledge of fairness, safety, and security testing for AI systems
  • Experience with AI infrastructure, orchestration, data integrity, deployments, and model registry practices
  • Interest in Responsible AI, bias mitigation, and regulatory compliance
  • Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Develop and maintain automated test frameworks for LLM, RAG, agentic, and classic ML systems
  • Implement golden/adversarial test cases (functional, safety, bias, prompt injection, refusal correctness, tool use)
  • Integrate evaluation tools (promptfoo, Ragas) into CI/CD pipelines and support batch quality assessments
  • Author and validate OpenAPI/JSON Schema contracts for model inputs/outputs
  • enforce schema checks in pipelines
  • Instrument observability metrics (latency, success rate, cost, drift) using OpenTelemetry and export to dashboards
  • Support data QA, lineage tracking, and drift monitoring (Evidently, OpenLineage)
  • Collaborate with AI developers, SDETs, and SRE to troubleshoot, optimize, and document automation processes
  • Contribute to red-team playbooks and security testbooks mapped to OWASP/ATLAS risks
What we offer
What we offer
  • Variety of health coverage options
  • Wellbeing and support programs
  • Retirement
  • Vacation and sick leave
  • Maternity, paternity & adoption leave
  • Continuing education and training
  • Several voluntary benefit options
  • Fulltime
Read More
Arrow Right