hero

Careers

Large Scale Training Engineer

Lightricks

Lightricks

Jerusalem, Israel
Posted on Jun 25, 2024

Lightricks is a pioneer in innovative technology that bridges the gap between imagination and creation. We are an AI-first company with a mission to build innovative tools for photo and video creation. Our photo and video editing apps – Facetune, Videoleap, and Photoleap – offer endless possibilities and inspiration, while our brand platform, Popular Pays, offers brands the opportunity to scale their marketing efforts by partnering with creators. Our most recent product, LTX Studio, is an all-in-one AI digital storytelling platform that gives users complete control over their creations. We aim to enable both content creators and brands to produce engaging, top-performing content, based on both groundbreaking computational graphic research and generative AI features.

The Core Generative AI team at Lightricks Research is a unified group of researchers and engineers dedicated to developing our generative foundational models that serve LTX Studio, our AI-based video creation platform. Our focus is on creating a controllable, cutting-edge video generative model by merging cutting-edge algorithms with exceptional engineering. This involves enhancing machine learning components within our sophisticated internal training framework, crucial for developing advanced models. We specialize in both research and engineering that enable efficient and scalable training and inference, allowing us to deliver state-of-the-art AI-generated video models.

About the Role

As a Large Scale Training Engineer, you will play a key role in enhancing the training throughput of our internal framework and enabling researchers to pioneer new model concepts. This role demands excellent engineering skills for designing, implementing, and optimizing cutting-edge AI models, alongside writing robust machine learning code and understanding supercomputer performance deeply. Your expertise in performance optimization, understanding distributed systems, and bug elimination will be crucial, as our framework supports extensive computations across numerous virtual machines.

This role is designed for individuals who are not only technically proficient but also deeply passionate about pushing the boundaries of AI and machine learning through innovative engineering and collaborative research.

Key Responsibilities:

  • Profile and optimize the training process to ensure efficiency and effectiveness, including optimizing multimodal data pipelines and data storage methods.
  • Develop high-performance TPU/CPU kernels and integrate advanced techniques into our training framework to maximize hardware efficiency.
  • Utilize knowledge of hardware features to make aggressive optimizations and advise on hardware/software co-designs.
  • Collaboratively develop model architectures with researchers that facilitate efficient training and inference.

Your skills and experience:

  • Experience with small to large scale ML experiments and multi-modal ML pipelines.
  • Strong software engineering skills, proficient in Python and experienced with modern C++.
  • Deep understanding of GPU, CPU, TPU, or other AI accelerator architectures.
  • Enjoys diving deep into system implementations to enhance performance and maintainability.
  • Passion for driving ML accuracy with low-precision formats and optimizing compute kernels.
  • Background in JAX/Pallas, Triton, CUDA, OpenCL, or similar technologies is a plus.