This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Luma is pushing the boundaries of generative AI, building tools that redefine how visual content is created. We’re seeking a candidate to help shape and scale the way we understand, measure, and improve model performance. In this role, you’ll partner with researchers, engineers, and technical artists to evaluate our models against real-world creative use cases, design frameworks that capture qualitative nuance, and identify actionable insights that guide development. This is not a checkbox metrics role - it's about building evaluative systems that match the complexity of human perception, creativity, and intention.
Job Responsibility:
Evaluate generative model performance across diverse tasks, prompts, and modalities
Identify key failure modes, regression patterns, and edge cases that impact product quality
Develop and maintain qualitative evaluation frameworks that are scalable and reusable
Collaborate closely with technical artists and engineers to align evaluations with model capabilities and target use cases
Translate high-level product goals into concrete evaluative criteria
Lead qualitative studies, side-by-side comparisons, and human-in-the-loop evaluation efforts
Provide detailed feedback that informs model fine-tuning, dataset curation, and product UX
Stay informed about emerging evaluation standards in generative AI and creative tools
Requirements:
Master’s degree or higher in Cognitive Science, Human-Computer Interaction (HCI), Design Research, Psychology, Media Studies, or a related field
5+ years of experience in product evaluation, UX research, model testing, or similar roles that involve structured qualitative assessment
Deep familiarity with creative workflows and real-world use cases for generative models (e.g., animation, filmmaking, digital art, VFX)
Strong systems thinking and the ability to define abstract qualities (like believability, identity retention, or scene coherence) in clear evaluative terms
Experience working cross-functionally with engineers, researchers, and creatives
Excellent written communication skills and the ability to synthesize nuanced judgments into clear, actionable insights
Nice to have:
Background in motion, visual effects, or storytelling pipelines
Experience evaluating AI-generated media (video, images, 3D)
Prior work on building internal tools for qualitative data collection or scoring
Familiarity with prompt engineering and reference-based input methods