Research Scientist - LTX Model Evaluation
Lightricks
About the Role
As a Research Scientist in Model Evaluation, you are the ultimate authority on model quality and utility. You will design the automated judges, reward models, evaluation datasets, and benchmarking ecosystems that determine the future of LTX. Your mission is to provide the "ground truth" for our pre-training and post-training teams. You will blend the rigor of a researcher with the intuition of a product-thinker, developing metrics that capture both the aesthetic soul of a video and the functional precision required for high-stakes professional use.
Key Responsibilities
- Steer Training & Research: Systematically evaluate model checkpoints to provide actionable insights that guide training experiments and architectural decisions.
- Design Benchmark Ecosystems: Develop and run rigorous benchmarks for release candidates against competitive models, ensuring LTX-2 remains world-class.
- Build Next-Gen Metrics: Develop robust automatic metrics and Reward Models (e.g., for RL, ITS, auto-research agents) that quantify complex attributes like temporal coherence, physical correctness, spatial accuracy, and foley synchronization.
- Diagnose & Analyze: Perform deep root-cause analysis on model failures, providing the diagnostic clarity needed for researchers to implement targeted fixes.
- Scale Evaluation: Collaborate with platform engineers to deploy evaluation frameworks across large-scale GPU clusters.
Ideal Candidate
- Technical Depth: Master’s or PhD in Computer Vision, ML, or a related field, with strong software engineering skills and comfort in complex ML training environments.
- The "Metric" Mindset: Deep expertise in evaluation methodology and statistical rigor. You know why standard metrics often fail and how to build better ones.
- Perceptual Intuition: A sharp "eye and ear" for quality. You can articulate subtle nuances in motion or sound that automated systems might miss and use that intuition to improve our reward models.
- Data-Driven Detective: You love diving into datasets to find the "why" behind the numbers, taking pride in curating and specializing data for specific evaluation tasks.
- Product-Minded Scientist: You can think like an end-user. You care that our models don't just "beat the benchmark" but actually work reliably in professional pipelines.
- Statistical Rigor: You understand experimental design, significance testing, and the nuances of perceptual quality assessment.