LTX-2: Production-Ready AI Video & Audio Generation Model

Name: LTX-2
Rating: 4.8 (1250 reviews)
Author: Lightricks

First open-source DiT-based foundation model for synchronized 4K video and audio generation with 19B parameters

🎬 4K Video Generation 🎵 Synchronized Audio ⚡ 19B Parameters 🔓 Open Source (Apache 2.0)

Free Image Generator Try Free Demo View on GitHub

Try LTX-2 Live Demo

Experience AI-powered 4K video and audio generation in real-time

If loading continues to fail, please check your network connection

Demo Temporarily Unavailable

Direct Access

How to Use LTX-2 Demo

Text-to-Video Generation

• Enter your text prompt describing the video
• Select video duration and quality settings
• Generate high-quality 4K video output

Image-to-Video Animation

• Upload a static image as input
• Add motion prompts for animation
• Generate video with synchronized audio

Latest Articles

Explore in-depth guides and practical tutorials for LTX-2

2026-03-06 12 min read

Qwen3.5-9B: Alibaba's 9B Parameter Model Beats 120B Models

Alibaba open-sourced Qwen3.5-9B, a 9B parameter model that achieves 81.7 on GPQA Diamond, outperforming OpenAI's GPT-OSS-120B (71.5).

2026-03-06 10 min read

FireRed-OCR-2B: Open-Source Document Parsing SOTA Model Outperforms 397B Parameters

FireRed-OCR is an open-source 2B parameter model that achieves 92.94% accuracy on OmniDocBench v1.5, outperforming Qwen3.5-397B and Gemini-3.0 Pro in document parsing.

2026-02-21 22 min read

MOSS-TTS Complete Guide: The Next Generation Open-Source TTS Model (2026)

Comprehensive guide to MOSS-TTS, the revolutionary open-source TTS model with 1.7B/0.6B variants, 10 languages, 49 voices, and ultra-low latency streaming.

2026-02-19 25 min read

Qwen3.5-397B-A17B: Most Powerful Open-Weight Language Model

The most powerful open-weight language model with 397B total parameters and 17B active per forward pass using MoE architecture.

2026-02-21 18 min read

ACE-Step 1.5 Complete Guide: The Next Generation Multimodal AI Model (2026)

Comprehensive guide to ACE-Step 1.5, the powerful open-source multimodal model with 32B parameters, Qwen2.5-32B backbone, and state-of-the-art vision-language performance.

2026-02-21 22 min read

Kani-TTS-2 Complete Guide: The Next Generation Open-Source TTS Model (2026)

Comprehensive guide to Kani-TTS-2, the revolutionary open-source TTS model with 2.5B/0.9B variants, 12 languages, 60+ voices, and ultra-low latency streaming.

2026-02-20 18 min read

FireRed-Image-Edit-1.0: High-Fidelity Image Editing Model

A specialized image editing model offering high-fidelity results in restoration, enhancement, style transfer, and object manipulation.

2026-02-20 20 min read

GLM-5: Zhipu AI's Latest Open-Source Language Model

Zhipu AI's latest open-source language model series with 9B+ parameters, 128K context, and multiple variants including GLM-5-Chat, GLM-5-Plus, and GLM-5-Flash.

2026-01-30 15 min read

Qwen3-ASR-1.7B: Multilingual Speech Recognition

Revolutionary multilingual speech recognition with 52 languages, 5.2% Chinese WER, 7.8% English WER, and 0.3x real-time factor.

2026-01-28 12 min read

DeepSeek-OCR-2: 3B Parameter OCR Model

Revolutionary OCR model achieving 91.09% accuracy with DeepEncoder V2 architecture and human-like reading patterns.

2026-01-28 18 min read

Z-Image: Open-Source Benchmark

Revolutionary 6B parameter model ranked #1 among open-source image generators with single-stream architecture.

2026-01-23 15 min read

Qwen3-TTS: Open-Source TTS Revolution

Revolutionary open-source text-to-speech model with 10 languages, 49 voices, and 3-second voice cloning.

2026-01-23 15 min read

VibeVoice-ASR: Revolutionary Speech Recognition

Microsoft's breakthrough ASR model for 60-minute single-pass transcription with speaker diarization.

2026-01-20 12 min read

AgentCPM-Explore: First 4B Agent Model

Groundbreaking 4B parameter agent model that matches 8B models with deep exploration capabilities.

2026-01-14 20 min read

GLM-Image: Industrial-Grade AI Image Generation

First open-source autoregressive image model with exceptional text rendering for posters and diagrams.

2026-01-16 12 min read

FLUX 2 Klein: Fastest AI Image Model

Professional-grade image generation on consumer hardware with sub-second inference times.

2026-01-13 15 min read

Qwen Image Layered: Revolutionary AI

Discover AI model that automatically decomposes images into editable RGBA layers.

View All Articles

Powerful Capabilities of LTX-2

Explore the advanced features that make LTX-2 the leading open-source AI video generation model

📝

Text-to-Video Generation

Generate high-quality videos from text prompts with LTX-2's advanced DiT architecture

🖼️

Image-to-Video Animation

Transform static images into dynamic videos with smooth motion and natural transitions

🎵

Synchronized Audio-Visual

Create perfectly synchronized audio and video content in a single unified model

🎬

4K High Resolution

Generate production-ready 4K videos with spatial upscaling capabilities

🎯

LoRA Fine-tuning

Customize LTX-2 for specific styles, motions, or appearances with efficient LoRA training

⚡

Multiple Performance Modes

Choose from dev, distilled, or quantized (fp8/fp4) models for optimal speed-quality balance

Advanced DiT Architecture

LTX-2 leverages cutting-edge Diffusion Transformer technology with 19B parameters

Model Specifications

LTX-2 is built on a Diffusion Transformer (DiT) architecture, the first of its kind to generate synchronized audio and video in a single unified model. With 19 billion parameters, it delivers production-ready quality for professional workflows.

Available Model Variants:

ltx-2-19b-dev (full precision, bf16)
ltx-2-19b-dev-fp8 (fp8 quantization)
ltx-2-19b-dev-fp4 (nvfp4 quantization)
ltx-2-19b-distilled (8 steps, CFG=1)

Upscaler Models:

Spatial upscaler (x2 resolution)
Temporal upscaler (x2 frame rate)

System Requirements:

Python ≥3.12
CUDA >12.7
PyTorch ~2.7

Real-World Applications of LTX-2

Discover how LTX-2 empowers creators across industries

Content Creation

Generate engaging social media videos from text descriptions with LTX-2's text-to-video capabilities

Film Production

Rapid prototyping and pre-visualization for filmmakers using LTX-2's 4K generation

Marketing & Advertising

Create promotional videos with synchronized audio using LTX-2's audio-visual synthesis

Education & Training

Produce educational content and tutorials with LTX-2's image-to-video animation

Research & Development

Experiment with AI video generation techniques using LTX-2's open-source architecture

Game Development

Generate cinematic cutscenes and trailers with LTX-2's video-to-video transformation

LTX-2 Video Examples

Explore stunning examples generated by LTX-2

Text-to-Video: Cinematic Scene

A dramatic sunset over mountains with flowing clouds

4K Resolution 5 seconds

Image-to-Video: Portrait Animation

Static portrait brought to life with natural motion

1080p 3 seconds

Audio-Visual: Music Video

Synchronized audio and video generation

4K With Audio

Video-to-Video: Style Transfer

Transform existing video with new artistic style

1080p 4 seconds

LoRA Fine-tuned: Custom Style

LTX-2 fine-tuned for specific artistic style

4K 6 seconds

Upscaled: 4K Enhancement

Spatial and temporal upscaling demonstration

4K 50 FPS

Get Started with LTX-2

Install and run LTX-2 locally in minutes

Installation

git clone https://github.com/Lightricks/LTX-2.git
cd LTX-2
uv sync
source .venv/bin/activate

Clone the LTX-2 repository and set up the environment using uv package manager

System Requirements

Python Version

≥ 3.12

CUDA Version

> 12.7

PyTorch Version

~ 2.7

View Full Documentation on GitHub

Frequently Asked Questions about LTX-2

Find answers to common questions about LTX-2

LTX-2 is a 19B parameter DiT-based AI foundation model for synchronized audio-video generation. It's the first open-source model of its kind, capable of generating high-quality 4K videos with synchronized audio from text prompts, images, or existing videos.

LTX-2 supports multiple generation modes: text-to-video, image-to-video, video-to-video transformation, audio-to-video, and joint audio-visual content creation. It can generate videos up to 4K resolution with synchronized audio.

LTX-2 requires Python ≥3.12, CUDA >12.7, PyTorch ~2.7, and an NVIDIA GPU with sufficient VRAM. The exact VRAM requirements depend on the model variant and generation settings you choose.

Yes, LTX-2 is fully open-source under the Apache 2.0 license. You can freely use, modify, and distribute LTX-2 for both personal and commercial projects.

LTX-2 offers several variants: dev (bf16 full precision), fp8 and fp4 quantized versions for faster inference, and a distilled version optimized for speed. Additionally, spatial and temporal upscaler models are available.

Yes, LTX-2 supports LoRA fine-tuning for custom styles, motions, and appearances. You can train motion, style, or likeness LoRAs in less than 1 hour in many settings.

LTX-2 supports up to 4K resolution with spatial upscaling capabilities. The base model generates videos at various resolutions, and the spatial upscaler can enhance them to 4K quality.

Generation time depends on the model variant you choose. The distilled version is fastest with 8 steps, while the dev version offers highest quality but takes longer. Quantized versions (fp8/fp4) provide a good balance.

Yes, LTX-2 is the first DiT model to generate synchronized audio and video in a single model. It can create perfectly matched audio-visual content for various applications.

You can try the live demo on HuggingFace Spaces at huggingface.co/spaces/Lightricks/ltx-2-distilled, or install LTX-2 locally from GitHub for full control and customization.