LTX 2.3 LoRA Migration: How to Retrain for the New Latent Space

Hey guys! How`s everything going? This is Dora. For the whole last week, I spent an entire afternoon troubleshooting why my character LoRA was producing visual garbage in LTX 2.3. Colors bleeding everywhere. Faces dissolving. Motion that looked like it was filmed underwater. I’d spent three weeks training that LoRA on LTX 2, and I was convinced something was wrong with my ComfyUI setup.

It wasn’t my setup. The LoRA itself was the problem — and there’s no fix except retraining from scratch.

If you were here, you would`ve probably hit the same wall. This guide covers exactly why LTX 2 LoRAs break in 2.3, what it actually takes to retrain, and the specific settings that matter for a clean result.

Why LTX 2 LoRAs Are Incompatible with 2.3 (Latent Space Change Explained Simply)

The core issue is the VAE — the Variational Autoencoder that encodes and decodes video frames. Lightricks completely rebuilt it for LTX 2.3, training it on higher-quality data with a redesigned architecture. The result is sharper textures, cleaner edges, and better fine detail. The new architecture generates sharper details across all resolutions — but it does so in a fundamentally different mathematical space.

Here’s the simple mental model: a LoRA is a set of weight offsets that nudge the model’s behavior in a specific direction. Those offsets were computed relative to how LTX 2 “sees” video data internally — its latent space. When you apply those same offsets to LTX 2.3, the model is now operating in a different latent space. The offsets point in directions that no longer mean anything. Output is garbage.

The latent space change means the weight offsets trained on LTX 2 produce garbage output when applied to LTX 2.3. You cannot convert them. This isn’t a file format issue. There’s no conversion script, no compatibility layer, no workaround. Retraining is the only path forward.

The same logic applies to upscalers — the LTX 2 spatial and temporal upscalers don’t transfer either, for the same reason. Lightricks ships new upscalers with 2.3 that are trained for the new latent space. Download those separately from HuggingFace before doing anything else.

What You Need to Retrain (Hardware, Dataset, ltx-trainer)

Hardware minimum: An NVIDIA RTX 3090 (24GB VRAM) will get you through a basic style or character LoRA with gradient checkpointing enabled. An RTX 4090 is the practical sweet spot for creators — rank 32 training at 960×544 resolution without constant memory management headaches. Nvidia H100 GPU with 80GB+ VRAM is what Lightricks lists as the reference setup, but that’s the enterprise ceiling, not the floor. Cloud GPU options (RunPod, vast.ai) on an A100 for a 3–5 hour run typically cost $10–20 and are often more practical than a long local run.

Dataset requirements:

  • Style / effect LoRA: 15–25 clips minimum
  • Character / identity LoRA: 20–35 clips, with consistent lighting and framing
  • IC-LoRA: 30–50 clips with corresponding reference frames

15–30 high-quality examples work better than 100 mediocre ones. Quality means high resolution without compression artifacts, consistent lighting and framing, and clear visibility of whatever concept you’re trying to teach.

Frame count constraint: LTX-2.3 enforces a hard shape rule — frame count must be 8n+1 (values: 1, 9, 17, 25, 33, 41, 49, 65, 97, 121). Use ffmpeg to trim clips to valid frame counts before preprocessing. Clips with wrong frame counts will silently fail or produce corrupted latents.

Tool: Lightricks’ official ltx-trainer package, part of the LTX-2 monorepo. This is the primary supported training tool for both standard LoRA and IC-LoRA. The Ostris AI Toolkit and finetrainers also support LTX-2.3 if you prefer a different interface.

Step-by-Step: Retrain Your LoRA with ltx-trainer

Prepare Your Dataset

Clone the repo and set up the environment:

git clone https://github.com/Lightricks/LTX-2.git
cd LTX-2
uv sync --frozen
source .venv/bin/activate

Organize your clips in a flat directory. Each video file needs a corresponding caption — either a .txt sidecar file with the same filename, or a dataset.json manifest. The JSON format is more reliable for large datasets:

[
  {
    "video_path": "scenes/clip_001.mp4",
    "caption": "MYTRIGGER young woman with brown hair, walking through city street at dusk, natural lighting, cinematic"
  },
  {
    "video_path": "scenes/clip_002.mp4",
    "caption": "MYTRIGGER same woman sitting at a cafe table, soft indoor lighting, shallow depth of field"
  }
]

Use a consistent trigger token — MYTRIGGER or a short unique string — in every caption. This is what activates your LoRA during inference. Don’t embed it manually in the JSON; use the --lora-trigger flag during preprocessing and ltx-trainer handles insertion automatically.

Preprocess your dataset to precompute latents and text embeddings (this saves significant time during training):

uv run python scripts/process_dataset.py dataset.json \
  --resolution-buckets "960x544x49" \
  --model-path /path/to/ltx-2.3-22b-dev.safetensors \
  --text-encoder-path /path/to/gemma-3-12b-it-qat-q4_0-unquantized \
  --lora-trigger "MYTRIGGER"

Add --decode after the run to VAE-decode your precomputed latents and verify they look correct before committing to a full training job. Catching a bad dataset at this stage saves hours.

Configure Training Script

The training config is a YAML file. Start from the template in configs/ and modify only what you need. Here’s a working baseline for a character LoRA:

# ltx_23_lora_character.yaml
model:
  checkpoint_path: /path/to/ltx-2.3-22b-dev.safetensors
  text_encoder_path: /path/to/gemma-3-12b-it-qat-q4_0-unquantized

dataset:
  data_root: ./scenes
  dataset_file: dataset.json
  resolution_buckets:
    - "960x544x49"

optimization:
  learning_rate: 1.0e-4
  batch_size: 1               # Required when using multiple resolution buckets
  max_train_steps: 1500
  gradient_checkpointing: true

lora:
  rank: 32
  alpha: 32

validation:
  validation_steps: 250
  validation_prompts:
    - "MYTRIGGER walking through a forest, morning light, cinematic"

output:
  output_dir: ./outputs/character_lora_v1

Run training:

uv run python scripts/train.py configs/ltx_23_lora_character.yaml

Key Hyperparameters

This table is what actually matters. Most problems with weak or overfit LoRAs come from getting these wrong first:

ParameterRecommended ValueNotes
rank32Default for most use cases. 64 for complex styles.
alphaEqual to rankKeeps effective learning rate stable
learning_rate1.00E-04Start here. Lower to 5.0e-5 if training is unstable
max_train_steps1000–2000Check validation at 500, 750, 1000 before going further
batch_size1Required with multiple resolution buckets
gradient_checkpointingTRUEEssential for sub-80GB VRAM setups
caption_dropout_p0.05Keep Cache Text Embeddings OFF if using dropout
mixed_precisionbf16Standard for LTX-2.3

Most weak LTX 2.3 LoRAs are not caused by “too few options” — they are caused by changing the wrong options first. Resist tweaking the learning rate until you’ve seen validation output. Check checkpoint 500 before assuming you need more steps. Pushing past 1500 steps on a small dataset usually produces overfit results, not better quality.

Load Your New LoRA in ComfyUI

After training, convert the output weights to ComfyUI format:

python scripts/convert_checkpoint.py outputs/character_lora_v1/lora_weights.safetensors --to-comfy

This produces lora_weights_comfy.safetensors. Copy it to your ComfyUI loras folder:

COMFYUI_ROOT/models/loras/ltx23_character_v1.safetensors

You also need the correct model assets in place. Per the official ComfyUI-LTXVideo repository:

  • LTX-2.3 checkpoint → models/checkpoints/
  • Spatial upscaler → models/latent_upscale_models/
  • Temporal upscaler → models/latent_upscale_models/
  • Distilled LoRA → models/loras/
  • Gemma text encoder → models/text_encoders/gemma-3-12b-it-qat-q4_0-unquantized/

In the workflow, use the Load LoRA node with your converted safetensors file. Set LoRA strength between 0.7–0.9 for the first test. If the trigger token isn’t activating the LoRA effect, check that your inference prompt includes exactly the trigger string you used during preprocessing.

If you’re loading an existing LTX 2 ComfyUI graph, expect a few deprecated node warnings. Ten minutes of node cleanup on the model loader and VAE nodes typically resolves compatibility issues.

Validate Results

Don’t judge a LoRA on a single generation. Use 3–5 prompts that specifically test what you trained for, plus 2–3 prompts that are deliberately off-topic to check for bleeding (where the LoRA’s style invades everything regardless of prompt).

A clean validation checklist:

  • Trigger test: Does including MYTRIGGER activate the concept reliably? Does removing it produce the base model’s output?
  • Consistency test: Generate the same prompt 3x with different seeds. Does the character/style hold across seeds?
  • Bleed test: Generate a completely unrelated scene without the trigger. Is the LoRA’s fingerprint present or absent?
  • Strength sweep: Test at 0.5, 0.75, and 1.0 strength. A well-trained LoRA degrades gracefully at low strength rather than collapsing.

If the LoRA bleeds into off-topic prompts, your dataset captions are too generic or your learning rate too high. If the trigger barely activates the concept, training may have underfit — try more steps before adjusting rank.

IC-LoRA: Is It Relevant to Your Use Case?

IC-LoRA (Image-Conditioning LoRA) is a meaningfully different tool from standard LoRA — not an upgrade, but a different application. Instead of conditioning on text prompts, IC-LoRA conditions generation on a reference video or image. You provide a source clip, and the model uses it as visual guidance for output structure, pose, depth, or motion.

Practical cases where IC-LoRA is the right choice: you want consistent character appearance locked to a reference image (not text description), you’re building product visualization where the product’s exact shape and proportion matter, or you’re doing style transfer from a reference clip.

Cases where standard LoRA is the right choice: you want a style or concept that activates on a text trigger without needing a reference frame, or you’re training motion/camera behavior patterns.

IC-LoRA training requires a reference dataset with paired source-and-reference videos, roughly 30–50 samples, and the preprocessing step is more involved — you need to generate reference latents separately using scripts/compute_reference.py. Dataset size and compute requirements are higher than standard LoRA. If you’re new to LTX-2.3 training, get a standard LoRA working first before attempting IC-LoRA.

Tips and Common Errors

Error: RuntimeError: CUDA out of memory Enable gradient_checkpointing: true in your config. Reduce resolution bucket to 768x432x49. If still failing on 24GB VRAM, reduce it to 640x360x33.

Error: AssertionError: frame count must be 8n+1 Your video clips have invalid frame counts. Use ffmpeg to trim:

ffmpeg -i input.mp4 -frames:v 49 output.mp4

Valid counts: 9, 17, 25, 33, 41, 49, 65, 97, 121.

Problem: LoRA activates inconsistently Caption quality is the most common culprit. Every caption should describe the same subject consistently across all clips. If clip 1 says “woman with brown hair” and clip 7 says “female figure,” the model treats these as different things. Standardize your caption vocabulary before preprocessing.

Problem: Output looks overfit after 2000 steps Check your validation checkpoints at 500 and 750 — one of those is probably your best result. Don’t assume more steps means better LoRA. Set checkpointing_steps: 250 in config to save intermediate checkpoints you can compare.

Problem: Old LTX 2 ComfyUI workflowthrowing errors LTX-2.3 uses a different VAE node and updated model loader. The LTX-Video ComfyUI repository maintains example workflows for both the dev and distilled variants — use these as the starting point for a 2.3-compatible graph rather than patching an LTX 2 workflow.

FAQ

Q: Is there any way to convert an LTX 2 LoRA to work with LTX 2.3?

No. The latent space change is architectural — the weight offsets from LTX 2 LoRAs reference internal representations that no longer exist in 2.3’s VAE. There’s no mathematical transformation that maps between them. Retraining is the only path.

Q: How long does retraining take on consumer hardware?

On an RTX 4090 with a 25-clip dataset at 960×544, expect 2–3 hours for 1500 steps. On an RTX 3090 with gradient checkpointing, 3–5 hours for the same run. Cloud GPU (A100 80GB on RunPod) runs the same job in under 2 hours.

Q: Do I need to retrain IC-LoRAs separately from standard LoRAs?

Yes — IC-LoRA and standard LoRA are trained differently. Your LTX 2 IC-LoRAs are incompatible with 2.3 for the same VAE reason as standard LoRAs, and retrained using the IC-LoRA pipeline with reference video preprocessing rather than the standard training script.


Previous Posts:

Leave a Reply

Your email address will not be published. Required fields are marked *