LTX-2: Production-Ready AI Video & Audio Generation Model
First open-source DiT-based foundation model for synchronized 4K video and audio generation with 19B parameters
Try LTX-2 Live Demo
Experience AI-powered 4K video and audio generation in real-time
Demo Temporarily Unavailable
How to Use LTX-2 Demo
Text-to-Video Generation
- β’ Enter your text prompt describing the video
- β’ Select video duration and quality settings
- β’ Generate high-quality 4K video output
Image-to-Video Animation
- β’ Upload a static image as input
- β’ Add motion prompts for animation
- β’ Generate video with synchronized audio
Powerful Capabilities of LTX-2
Explore the advanced features that make LTX-2 the leading open-source AI video generation model
Text-to-Video Generation
Generate high-quality videos from text prompts with LTX-2's advanced DiT architecture
Image-to-Video Animation
Transform static images into dynamic videos with smooth motion and natural transitions
Synchronized Audio-Visual
Create perfectly synchronized audio and video content in a single unified model
4K High Resolution
Generate production-ready 4K videos with spatial upscaling capabilities
LoRA Fine-tuning
Customize LTX-2 for specific styles, motions, or appearances with efficient LoRA training
Multiple Performance Modes
Choose from dev, distilled, or quantized (fp8/fp4) models for optimal speed-quality balance
Advanced DiT Architecture
LTX-2 leverages cutting-edge Diffusion Transformer technology with 19B parameters
Model Specifications
LTX-2 is built on a Diffusion Transformer (DiT) architecture, the first of its kind to generate synchronized audio and video in a single unified model. With 19 billion parameters, it delivers production-ready quality for professional workflows.
Available Model Variants:
- ltx-2-19b-dev (full precision, bf16)
- ltx-2-19b-dev-fp8 (fp8 quantization)
- ltx-2-19b-dev-fp4 (nvfp4 quantization)
- ltx-2-19b-distilled (8 steps, CFG=1)
Upscaler Models:
- Spatial upscaler (x2 resolution)
- Temporal upscaler (x2 frame rate)
System Requirements:
- Python β₯3.12
- CUDA >12.7
- PyTorch ~2.7
Real-World Applications of LTX-2
Discover how LTX-2 empowers creators across industries
Content Creation
Generate engaging social media videos from text descriptions with LTX-2's text-to-video capabilities
Film Production
Rapid prototyping and pre-visualization for filmmakers using LTX-2's 4K generation
Marketing & Advertising
Create promotional videos with synchronized audio using LTX-2's audio-visual synthesis
Education & Training
Produce educational content and tutorials with LTX-2's image-to-video animation
Research & Development
Experiment with AI video generation techniques using LTX-2's open-source architecture
Game Development
Generate cinematic cutscenes and trailers with LTX-2's video-to-video transformation
LTX-2 Video Examples
Explore stunning examples generated by LTX-2
Text-to-Video: Cinematic Scene
A dramatic sunset over mountains with flowing clouds
Image-to-Video: Portrait Animation
Static portrait brought to life with natural motion
Audio-Visual: Music Video
Synchronized audio and video generation
Video-to-Video: Style Transfer
Transform existing video with new artistic style
LoRA Fine-tuned: Custom Style
LTX-2 fine-tuned for specific artistic style
Upscaled: 4K Enhancement
Spatial and temporal upscaling demonstration
Get Started with LTX-2
Install and run LTX-2 locally in minutes
Installation
git clone https://github.com/Lightricks/LTX-2.git
cd LTX-2
uv sync
source .venv/bin/activate
Clone the LTX-2 repository and set up the environment using uv package manager
System Requirements
Python Version
β₯ 3.12
CUDA Version
> 12.7
PyTorch Version
~ 2.7
Frequently Asked Questions about LTX-2
Find answers to common questions about LTX-2
LTX-2 is a 19B parameter DiT-based AI foundation model for synchronized audio-video generation. It's the first open-source model of its kind, capable of generating high-quality 4K videos with synchronized audio from text prompts, images, or existing videos.
LTX-2 supports multiple generation modes: text-to-video, image-to-video, video-to-video transformation, audio-to-video, and joint audio-visual content creation. It can generate videos up to 4K resolution with synchronized audio.
LTX-2 requires Python β₯3.12, CUDA >12.7, PyTorch ~2.7, and an NVIDIA GPU with sufficient VRAM. The exact VRAM requirements depend on the model variant and generation settings you choose.
Yes, LTX-2 is fully open-source under the Apache 2.0 license. You can freely use, modify, and distribute LTX-2 for both personal and commercial projects.
LTX-2 offers several variants: dev (bf16 full precision), fp8 and fp4 quantized versions for faster inference, and a distilled version optimized for speed. Additionally, spatial and temporal upscaler models are available.
Yes, LTX-2 supports LoRA fine-tuning for custom styles, motions, and appearances. You can train motion, style, or likeness LoRAs in less than 1 hour in many settings.
LTX-2 supports up to 4K resolution with spatial upscaling capabilities. The base model generates videos at various resolutions, and the spatial upscaler can enhance them to 4K quality.
Generation time depends on the model variant you choose. The distilled version is fastest with 8 steps, while the dev version offers highest quality but takes longer. Quantized versions (fp8/fp4) provide a good balance.
Yes, LTX-2 is the first DiT model to generate synchronized audio and video in a single model. It can create perfectly matched audio-visual content for various applications.
You can try the live demo on HuggingFace Spaces at huggingface.co/spaces/Lightricks/ltx-2-distilled, or install LTX-2 locally from GitHub for full control and customization.