{"id":5897,"date":"2026-03-26T17:36:05","date_gmt":"2026-03-26T09:36:05","guid":{"rendered":"https:\/\/crepal.ai\/blog\/?p=5897"},"modified":"2026-03-26T17:36:07","modified_gmt":"2026-03-26T09:36:07","slug":"ltx-2-3-ic-lora-guide","status":"publish","type":"post","link":"https:\/\/crepal.ai\/blog\/aivideo\/ltx-2-3-ic-lora-guide\/","title":{"rendered":"How to Use IC-LoRA in LTX 2.3: Identity-Consistent Video"},"content":{"rendered":"\n<p>HEY fellows! I`m Dora, as usual. Lately, one problem that sent me down this rabbit hole was embarrassingly simple: I was making an episodic short-form series for a client, and every time I generated a new clip, the main character looked like a different person. Same prompt, same seed range, same style LoRA \u2014 different cheekbones, different eyes, different voice tone. Consistent identity across video clips has been the hardest unsolved problem in AI video generation for a while.<\/p>\n\n\n\n<p><a href=\"https:\/\/huggingface.co\/Lightricks\/LTX-2.3-22b-IC-LoRA-Motion-Track-Control\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">IC-LoRA in LTX 2.3<\/a> is the most practical answer I&#8217;ve found so far. It&#8217;s not magic, but it&#8217;s real, it just got proper native ComfyUI support. This guide covers what it actually does, how to train one, and what you should realistically expect from your output.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-ic-lora-is-and-why-it-matters-for-video\">What IC-LoRA Is and Why It Matters for Video<\/h2>\n\n\n\n<p>IC-LoRA stands for <strong>In-Context <\/strong><strong>LoRA<\/strong> \u2014 a training approach where the LoRA learns to condition generation on reference inputs rather than just text. Standard LoRAs teach the model &#8220;what this concept looks like&#8221; through captions and training clips. IC-LoRA teaches it something more structured: &#8220;given this reference image and this reference audio, produce a video where the subject looks and sounds like the reference.&#8221;<\/p>\n\n\n\n<p>The practical payoff for creators: you provide a single reference image (a face, a character, a product) and a short audio clip (a voice sample), and LTX 2.3 generates a video where that person is speaking, moving, and looking consistent with the reference \u2014 synchronized audio included. That&#8217;s genuinely different from text-prompt consistency tricks, which always drift across generations.<\/p>\n\n\n\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"912\" height=\"571\" data-id=\"5902\" data-src=\"https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-204.png\" alt=\"\" class=\"wp-image-5902 lazyload\" data-srcset=\"https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-204.png 912w, https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-204-300x188.png 300w, https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-204-768x481.png 768w, https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-204-18x12.png 18w\" data-sizes=\"auto, (max-width: 912px) 100vw, 912px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 912px; --smush-placeholder-aspect-ratio: 912\/571;\" \/><\/figure>\n<\/figure>\n\n\n\n<p>This matters most for anyone building recurring characters: virtual hosts, brand mascots, indie animation, AI-assisted narration series, or product spokespersons. The goal isn&#8217;t photorealistic deepfakes \u2014 it&#8217;s controllable, repeatable identity that holds up across a content pipeline.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"ic-lora-vs-standard-lora-in-ltx-2-3\">IC-LoRA vs Standard LoRA in LTX 2.3<\/h2>\n\n\n\n<p>Understanding the difference prevents a lot of wasted training time:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td class=\"has-text-align-center\" data-align=\"center\"><\/td><td class=\"has-text-align-center\" data-align=\"center\">Standard LoRA<\/td><td class=\"has-text-align-center\" data-align=\"center\">IC-LoRA<\/td><\/tr><tr><td>Activation<\/td><td>Text trigger token<\/td><td>Reference image + audio at inference<\/td><\/tr><tr><td>What it learns<\/td><td>Style, concept, motion pattern<\/td><td>Identity mapping from reference to output<\/td><\/tr><tr><td>Dataset type<\/td><td>Video clips + text captions<\/td><td>Video clips + paired reference frames\/audio<\/td><\/tr><tr><td>Training complexity<\/td><td>Lower<\/td><td>Higher \u2014 paired data required<\/td><\/tr><tr><td>Use case<\/td><td>Style transfer, character aesthetics<\/td><td>Face\/voice identity preservation<\/td><\/tr><tr><td>Inference input<\/td><td>Text prompt only<\/td><td>Text + reference image + (optionally) audio<\/td><\/tr><tr><td>Training steps<\/td><td>1000\u20132000 typical<\/td><td>4000\u20136000 for identity (ID-LoRA rank 128)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The key distinction: a standard LoRA activates on a text token. IC-LoRA activates a reference input at inference time. You don&#8217;t need a trigger word \u2014 you hand it a photo and voice clip, and the model does the rest.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"prerequisites-dataset-vram-ltx-trainer-version\">Prerequisites (Dataset, VRAM, ltx-trainer Version)<\/h2>\n\n\n\n<p><strong>Hardware:<\/strong> An H100 80GB is the officially documented reference. In practice, an RTX 4090 (24GB) works with gradient checkpointing enabled and resolution capped at 768\u00d7432. IC-LoRA at rank 128 (the ID-LoRA configuration) requires more memory than rank 32 standard LoRA \u2014 budget for slower training or cloud GPU time. A100 80GB on RunPod or vast.ai runs the full job in under 3 hours at reasonable cost.<\/p>\n\n\n\n<p><strong>ltx-trainer version:<\/strong> Use the version from the <a href=\"https:\/\/github.com\/Lightricks\/LTX-2\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Lightricks\/LTX-2 monorepo<\/a>, specifically the <code>ltx-trainer<\/code> package. For the identity-audio variant (ID-LoRA), use the <a href=\"https:\/\/github.com\/ID-LoRA\/ID-LoRA\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">ID-LoRA\/ID-LoRA repository which builds on top of ltx-trainer <\/a>with the <code>audio_ref_only_ic<\/code> training strategy. As of March 24, 2026, native ComfyUI support for ID-LoRA was merged upstream (PR #13111) \u2014 you no longer need a custom node fork.<\/p>\n\n\n\n<p><strong>Model assets required:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>LTX-2.3 base checkpoint (~44 GB)       \u2192 models\/checkpoints\/\nGemma text encoder (~6 GB)             \u2192 models\/text_encoders\/\nSpatial upscaler (~700 MB)             \u2192 models\/latent_upscale_models\/\nTemporal upscaler                      \u2192 models\/latent_upscale_models\/\nDistilled LoRA (~900 MB)               \u2192 models\/loras\/<\/code><\/pre>\n\n\n\n<p>Download all assets from <a href=\"https:\/\/huggingface.co\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">HuggingFace<\/a> before starting. The distilled LoRA is required for the two-stage pipeline that IC-LoRA uses at inference. The <a href=\"https:\/\/huggingface.co\/Lightricks\/LTX-2\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">official LTX-2 model card on HuggingFace<\/a> lists all required assets with direct download links and notes that IC-LoRA training in many settings takes under an hour.<\/p>\n\n\n\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-2 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1024\" height=\"500\" data-id=\"5901\" data-src=\"https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-203-1024x500.png\" alt=\"\" class=\"wp-image-5901 lazyload\" data-srcset=\"https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-203-1024x500.png 1024w, https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-203-300x147.png 300w, https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-203-768x375.png 768w, https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-203-18x9.png 18w, https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-203.png 1111w\" data-sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 1024px; --smush-placeholder-aspect-ratio: 1024\/500;\" \/><\/figure>\n<\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"training-an-ic-lora-with-ltx-trainer\">Training an IC-LoRA with ltx-trainer<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"dataset-requirements-for-consistent-identity\">Dataset Requirements for Consistent Identity<\/h3>\n\n\n\n<p>This is where IC-LoRA training succeeds or fails. The dataset requirement is more demanding than standard LoRA because you&#8217;re teaching a paired relationship \u2014 what goes in (reference) must clearly correspond to what comes out (generated video).<\/p>\n\n\n\n<p><strong>Minimum viable dataset:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>30\u201350 video clips of the subject<\/li>\n\n\n\n<li>Each clip: 3\u201310 seconds, consistent lighting, face clearly visible<\/li>\n\n\n\n<li>Paired reference frames for each clip \u2014 typically the first frame or a held portrait shot<\/li>\n\n\n\n<li>For voice identity: 5\u201310 audio reference clips of the subject speaking, each 5\u201315 seconds<\/li>\n<\/ul>\n\n\n\n<p><strong>Caption format for IC-LoRA identity training:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&#091;\n  {\n    \"video_path\": \"data\/clip_001.mp4\",\n    \"reference_image\": \"data\/ref_001.jpg\",\n    \"reference_audio\": \"data\/voice_001.wav\",\n    \"caption\": \"person speaking directly to camera, natural lighting, slight smile\"\n  },\n  {\n    \"video_path\": \"data\/clip_002.mp4\",\n    \"reference_image\": \"data\/ref_002.jpg\",\n    \"reference_audio\": \"data\/voice_002.wav\",\n    \"caption\": \"person walking outdoors, casual clothing, daylight\"\n  }\n]<\/code><\/pre>\n\n\n\n<p>Keep captions descriptive but generic \u2014 don&#8217;t over-specify physical features in the caption text. The reference image carries the identity signal; the caption describes the scene and action. If your captions say &#8220;brown-haired woman&#8221; but your reference frames show a bald man, the model gets confused about what it&#8217;s supposed to learn.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"training-configuration\">Training Configuration<\/h3>\n\n\n\n<p>Clone and set up the ID-LoRA repository for LTX-2.3:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>git clone https:\/\/github.com\/ID-LoRA\/ID-LoRA.git\ncd ID-LoRA\n# Switch to LTX-2.3 workspace\n# Edit pyproject.toml: members = &#091;\"ID-LoRA-2.3\/packages\/*\"]\nuv sync --frozen\nsource .venv\/bin\/activate<\/code><\/pre>\n\n\n\n<p>The training config for identity IC-LoRA (based on the published <code>training_celebvhq.yaml<\/code>):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>model:\n  checkpoint_path: \/path\/to\/ltx-2.3-22b-dev.safetensors\n  text_encoder_path: \/path\/to\/gemma-3-12b-it-qat-q4_0-unquantized\n\ndataset:\n  dataset_file: dataset.json\n  resolution_buckets:\n    - \"768x432x49\"\n\ntraining_strategy: audio_ref_only_ic   # IC-LoRA strategy\n\noptimization:\n  learning_rate: 1.0e-4\n  batch_size: 1\n  max_train_steps: 6000\n  gradient_checkpointing: true\n\nlora:\n  rank: 128          # ID-LoRA uses rank 128 for identity fidelity\n  alpha: 128\n\nvalidation:\n  validation_steps: 500\n  reference_image: \/path\/to\/validation_ref.jpg\n  reference_audio: \/path\/to\/validation_voice.wav\n  validation_prompts:\n    - \"person speaking calmly, neutral background\"\n\noutput:\n  output_dir: .\/outputs\/identity_ic_lora_v1<\/code><\/pre>\n\n\n\n<p>Run training:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>uv run python ID-LoRA-2.3\/packages\/ltx-trainer\/scripts\/train.py \\\n  ID-LoRA-2.3\/configs\/training_celebvhq.yaml<\/code><\/pre>\n\n\n\n<p>Key difference from standard LoRA: <code>training_strategy: audio_ref_only_ic<\/code> tells ltx-trainer to use reference conditioning instead of text-only supervision. Rank 128 is significantly higher than standard LoRA (rank 32) \u2014 this is what gives IC-LoRA the capacity to encode detailed identity features, but it also means higher memory usage and longer training.<\/p>\n\n\n\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-3 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"926\" height=\"442\" data-id=\"5900\" data-src=\"https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-202.png\" alt=\"\" class=\"wp-image-5900 lazyload\" data-srcset=\"https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-202.png 926w, https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-202-300x143.png 300w, https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-202-768x367.png 768w, https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-202-18x9.png 18w\" data-sizes=\"auto, (max-width: 926px) 100vw, 926px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 926px; --smush-placeholder-aspect-ratio: 926\/442;\" \/><\/figure>\n<\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"loading-ic-lora-in-comfyui-workflow\">Loading IC-LoRA in ComfyUI Workflow<\/h2>\n\n\n\n<p>As the upstream ComfyUI merge (PR #13111), IC-LoRA loading no longer requires a custom node installation. The two nodes you need are now at the core:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong><code>LTXICLoRALoaderModelOnly<\/code><\/strong> \u2014 loads your IC-LoRA <code>.safetensors<\/code> and extracts the reference downscale factor<\/li>\n\n\n\n<li><strong><code>LTXAddVideoICLoRAGuide<\/code><\/strong> \u2014 attaches the reference image\/audio as a conditioning guide to the generation pipeline<\/li>\n\n\n\n<li><strong><code>LTXVReferenceAudio<\/code><\/strong> \u2014 handles reference audio for voice identity transfer<\/li>\n<\/ul>\n\n\n\n<p>Workflow setup:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&#091;Load LTX-2.3 Checkpoint]\n       \u2193\n&#091;LTXICLoRALoaderModelOnly] \u2190 ic_lora_weights.safetensors\n       \u2193\n&#091;LTXAddVideoICLoRAGuide] \u2190 reference_image.jpg + reference_audio.wav\n       \u2193\n&#091;LTX Sampler \/ KSampler]\n       \u2193\n&#091;VAE Decode \u2192 Video Output]<\/code><\/pre>\n\n\n\n<p>Copy the trained LoRA to <code>COMFYUI_ROOT\/models\/loras\/<\/code> and load it via <code>LTXICLoRALoaderModelOnly<\/code>. Per the <a href=\"https:\/\/huggingface.co\/Lightricks\/LTX-2.3-22b-IC-LoRA-Motion-Track-Control\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">official IC-LoRA workflow documentation on HuggingFace<\/a>, always use the <code>LTXAddVideoICLoRAGuide<\/code> node to pass the reference \u2014 don&#8217;t use the standard LoRA loader, which bypasses the reference conditioning mechanism entirely.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"prompting-tips-for-identity-consistency\">Prompting Tips for Identity Consistency<\/h2>\n\n\n\n<p>IC-LoRA changes how you should write prompts. Because identity is carried by the reference input, you don&#8217;t need to describe the subject&#8217;s appearance in the text prompt \u2014 doing so can actually create conflicts.<\/p>\n\n\n\n<p><strong>Do:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\"person speaking directly to camera, warm indoor lighting, natural expression\"\n\"subject walking through a park, relaxed pace, late afternoon sunlight\"\n\"presenter gesturing while explaining, clean white background, professional\"<\/code><\/pre>\n\n\n\n<p><strong>Avoid:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\"young woman with brown curly hair and green eyes speaking...\"<\/code><\/pre>\n\n\n\n<p>Describing physical traits in the prompt competes with the reference image signal. The model has two sources telling it what the subject looks like \u2014 and they won&#8217;t agree perfectly, causing drift or feature blending.<\/p>\n\n\n\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-4 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"680\" height=\"425\" data-id=\"5899\" data-src=\"https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-201.png\" alt=\"\" class=\"wp-image-5899 lazyload\" data-srcset=\"https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-201.png 680w, https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-201-300x188.png 300w, https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-201-18x12.png 18w\" data-sizes=\"auto, (max-width: 680px) 100vw, 680px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 680px; --smush-placeholder-aspect-ratio: 680\/425;\" \/><\/figure>\n<\/figure>\n\n\n\n<p>Keep scene and action descriptions concrete. Vague motion descriptors (&#8220;moving naturally&#8221;) produce inconsistent results. &#8220;Slowly turns head left while speaking&#8221; gives the model clearer direction. For audio identity, the reference audio clip handles the voice \u2014 your text prompt should describe the scene and tone, not vocal characteristics.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"limitations-drift-multi-subject\">Limitations (Drift, Multi-Subject)<\/h2>\n\n\n\n<p><strong>Identity drift over long clips:<\/strong> Identity consistency holds well for 5\u201320 second clips, which maps neatly to LTX 2.3&#8217;s generation range. For longer sequences, drift accumulates \u2014 the character may look subtly different at second 25 vs second 5. The practical fix is generating in segments and cutting at natural edit points. Whether longer generation windows will maintain identity is still an open research problem.<\/p>\n\n\n\n<p><strong>Multi-subject generation:<\/strong> IC-LoRA is trained on single-identity reference inputs. Two-person scenes with two distinct reference identities are not natively supported in the current ltx-trainer implementation. You can run inference with one reference identity and prompt for a second character, but the second person won&#8217;t be reference-conditioned \u2014 they&#8217;ll be generated from the text description only.<\/p>\n\n\n\n<p><strong>Extreme pose and lighting changes:<\/strong> Reference frames work best when the inference scene&#8217;s lighting and camera angle are reasonably similar to training data. Asking an IC-LoRA trained on frontal-face indoor clips to generate a profile-angle outdoor scene will reduce identity fidelity, though it won&#8217;t fail completely.<\/p>\n\n\n\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-5 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"919\" height=\"626\" data-id=\"5898\" data-src=\"https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-200.png\" alt=\"\" class=\"wp-image-5898 lazyload\" data-srcset=\"https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-200.png 919w, https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-200-300x204.png 300w, https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-200-768x523.png 768w, https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-200-18x12.png 18w\" data-sizes=\"auto, (max-width: 919px) 100vw, 919px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 919px; --smush-placeholder-aspect-ratio: 919\/626;\" \/><\/figure>\n<\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"results-examples-and-what-to-expect\">Results: Examples and What to Expect<\/h2>\n\n\n\n<p>Based on my testing through this month, a well-trained identity IC-LoRA at rank 128, 5000\u20136000 steps, on a clean 40-clip dataset produces:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Face consistency:<\/strong> Strong across scene changes, lighting variations, and different actions \u2014 the most reliable output dimension<\/li>\n\n\n\n<li><strong>Voice consistency:<\/strong> Requires clean reference audio (minimal background noise, clear speech) \u2014 when the reference audio is good, voice identity transfer is noticeably accurate<\/li>\n\n\n\n<li><strong>Expression naturalness:<\/strong> Better than standard LoRA character training; the reference conditioning prevents the &#8220;frozen expression&#8221; issue that often appears in text-only character LoRAs<\/li>\n\n\n\n<li><strong>Artifacts:<\/strong> Occasional texture flickering on hair and fine fabric detail in motion-heavy scenes \u2014 consistent with LTX 2.3 base model behavior, not specific to IC-LoRA<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"faq\">FAQ<\/h2>\n\n\n\n<p><strong>Q: Can I use an existing face photo I find online as a reference image?<\/strong><\/p>\n\n\n\n<p>Technically the tool will accept it. But you should not do this for anyone without their consent \u2014 the result is a video that makes someone appear to be saying and doing things they didn&#8217;t. Beyond the ethical problem, this likely violates platform policies and, depending on jurisdiction, applicable law. Use reference material you own and have consent to use.<\/p>\n\n\n\n<p><strong>Q: Why does my IC-LoRA look fine in validation but drift in full inference\uff1f<\/strong><\/p>\n\n\n\n<p>Validation prompts are short and generated at low resolution \u2014 they don&#8217;t reveal drift behavior that appears in longer, higher-resolution inference. Check your results at full inference resolution (960\u00d7544 or higher) and at the clip length you actually intend to use before declaring the LoRA good.<\/p>\n\n\n\n<p><strong>Q: How many training steps are enough?<\/strong><\/p>\n\n\n\n<p>The published ID-LoRA configuration uses 6000 steps. Check validation at steps 2000, 3500, and 5000 before completing the run \u2014 identity quality often plateaus before 6000, and the optimal checkpoint varies by dataset quality. A clean 35-clip dataset may converge at 4000; a messier 60-clip dataset may need the full 6000 or more.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<p>Previous Posts:<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-wp-embed is-provider-crepal-content-center wp-block-embed-crepal-content-center\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"wp-embedded-content\" data-secret=\"7ivRll9pVU\"><a href=\"https:\/\/crepal.ai\/blog\/aivideo\/how-to-install-ltx-2-3-comfyui\/\">How to Install LTX 2.3 in ComfyUI: Step-by-Step Guide<\/a><\/blockquote><iframe class=\"wp-embedded-content lazyload\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; visibility: hidden;\" title=\"\u300a How to Install LTX 2.3 in ComfyUI: Step-by-Step Guide \u300b\u2014CrePal Content Center\" data-src=\"https:\/\/crepal.ai\/blog\/aivideo\/how-to-install-ltx-2-3-comfyui\/embed\/#?secret=hAkl96pn1J#?secret=7ivRll9pVU\" data-secret=\"7ivRll9pVU\" width=\"600\" height=\"338\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" data-load-mode=\"1\"><\/iframe>\n<\/div><\/figure>\n\n\n\n<figure class=\"wp-block-embed is-type-wp-embed is-provider-crepal-content-center wp-block-embed-crepal-content-center\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"wp-embedded-content\" data-secret=\"VwOpw9YMRt\"><a href=\"https:\/\/crepal.ai\/blog\/aivideo\/what-is-ltx-2-3\/\">What Is LTX 2.3: The 22B Open-Source Video Model Explained<\/a><\/blockquote><iframe class=\"wp-embedded-content lazyload\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; visibility: hidden;\" title=\"\u300a What Is LTX 2.3: The 22B Open-Source Video Model Explained \u300b\u2014CrePal Content Center\" data-src=\"https:\/\/crepal.ai\/blog\/aivideo\/what-is-ltx-2-3\/embed\/#?secret=7d0bCb0nEk#?secret=VwOpw9YMRt\" data-secret=\"VwOpw9YMRt\" width=\"600\" height=\"338\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" data-load-mode=\"1\"><\/iframe>\n<\/div><\/figure>\n\n\n\n<figure class=\"wp-block-embed is-type-wp-embed is-provider-crepal-content-center wp-block-embed-crepal-content-center\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"wp-embedded-content\" data-secret=\"y6ociLEZJW\"><a href=\"https:\/\/crepal.ai\/blog\/aivideo\/ltx-2-3-vs-ltx-2-upgrade-guide\/\">LTX 2.3 vs LTX 2: What Changed and Should You Upgrade?<\/a><\/blockquote><iframe class=\"wp-embedded-content lazyload\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; visibility: hidden;\" title=\"\u300a LTX 2.3 vs LTX 2: What Changed and Should You Upgrade? \u300b\u2014CrePal Content Center\" data-src=\"https:\/\/crepal.ai\/blog\/aivideo\/ltx-2-3-vs-ltx-2-upgrade-guide\/embed\/#?secret=T8o3LiuSn9#?secret=y6ociLEZJW\" data-secret=\"y6ociLEZJW\" width=\"600\" height=\"338\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" data-load-mode=\"1\"><\/iframe>\n<\/div><\/figure>\n\n\n\n<figure class=\"wp-block-embed is-type-wp-embed is-provider-crepal-content-center wp-block-embed-crepal-content-center\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"wp-embedded-content\" data-secret=\"gMSnjw3lws\"><a href=\"https:\/\/crepal.ai\/blog\/aivideo\/ltx-2-3-vs-wan-2-2\/\">LTX 2.3 vs WAN 2.2: Best Open-Source Video Model in 2026?<\/a><\/blockquote><iframe class=\"wp-embedded-content lazyload\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; visibility: hidden;\" title=\"\u300a LTX 2.3 vs WAN 2.2: Best Open-Source Video Model in 2026? \u300b\u2014CrePal Content Center\" data-src=\"https:\/\/crepal.ai\/blog\/aivideo\/ltx-2-3-vs-wan-2-2\/embed\/#?secret=QaVm0sirCp#?secret=gMSnjw3lws\" data-secret=\"gMSnjw3lws\" width=\"600\" height=\"338\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" data-load-mode=\"1\"><\/iframe>\n<\/div><\/figure>\n\n\n\n<figure class=\"wp-block-embed is-type-wp-embed is-provider-crepal-content-center wp-block-embed-crepal-content-center\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"wp-embedded-content\" data-secret=\"9fzlR0iafv\"><a href=\"https:\/\/crepal.ai\/blog\/aivideo\/blog-seedance-2-0-character-consistency\/\">Seedance 2.0 Character Consistency: How to Stop Identity Drift Across Scenes<\/a><\/blockquote><iframe class=\"wp-embedded-content lazyload\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; visibility: hidden;\" title=\"\u300a Seedance 2.0 Character Consistency: How to Stop Identity Drift Across Scenes \u300b\u2014CrePal Content Center\" data-src=\"https:\/\/crepal.ai\/blog\/aivideo\/blog-seedance-2-0-character-consistency\/embed\/#?secret=iZRaf0xnJa#?secret=9fzlR0iafv\" data-secret=\"9fzlR0iafv\" width=\"600\" height=\"338\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" data-load-mode=\"1\"><\/iframe>\n<\/div><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>HEY fellows! I`m Dora, as usual. Lately, one problem that sent me down this rabbit hole was embarrassingly simple: I was making an episodic short-form series for a client, and every time I generated a new clip, the main character looked like a different person. Same prompt, same seed range, same style LoRA \u2014 different [&hellip;]<\/p>\n","protected":false},"author":5,"featured_media":5903,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_gspb_post_css":"","_uag_custom_page_level_css":"","footnotes":""},"categories":[8],"tags":[],"class_list":["post-5897","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-aivideo"],"blocksy_meta":[],"uagb_featured_image_src":{"full":["https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-205.png",1376,768,false],"thumbnail":["https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-205-150x150.png",150,150,true],"medium":["https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-205-300x167.png",300,167,true],"medium_large":["https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-205-768x429.png",768,429,true],"large":["https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-205-1024x572.png",1024,572,true],"1536x1536":["https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-205.png",1376,768,false],"2048x2048":["https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-205.png",1376,768,false],"trp-custom-language-flag":["https:\/\/crepal.ai\/blog\/wp-content\/uploads\/2026\/03\/image-205-18x10.png",18,10,true]},"uagb_author_info":{"display_name":"Dora","author_link":"https:\/\/crepal.ai\/blog\/author\/dora\/"},"uagb_comment_info":1,"uagb_excerpt":"HEY fellows! I`m Dora, as usual. Lately, one problem that sent me down this rabbit hole was embarrassingly simple: I was making an episodic short-form series for a client, and every time I generated a new clip, the main character looked like a different person. Same prompt, same seed range, same style LoRA \u2014 different&hellip;","_links":{"self":[{"href":"https:\/\/crepal.ai\/blog\/wp-json\/wp\/v2\/posts\/5897","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/crepal.ai\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/crepal.ai\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/crepal.ai\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/crepal.ai\/blog\/wp-json\/wp\/v2\/comments?post=5897"}],"version-history":[{"count":1,"href":"https:\/\/crepal.ai\/blog\/wp-json\/wp\/v2\/posts\/5897\/revisions"}],"predecessor-version":[{"id":5904,"href":"https:\/\/crepal.ai\/blog\/wp-json\/wp\/v2\/posts\/5897\/revisions\/5904"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/crepal.ai\/blog\/wp-json\/wp\/v2\/media\/5903"}],"wp:attachment":[{"href":"https:\/\/crepal.ai\/blog\/wp-json\/wp\/v2\/media?parent=5897"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/crepal.ai\/blog\/wp-json\/wp\/v2\/categories?post=5897"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/crepal.ai\/blog\/wp-json\/wp\/v2\/tags?post=5897"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}