LTX 2.3 Foley V2A ComfyUI Workflow

This repository contains ready-to-test ComfyUI workflows for the FuzzPuppy/LTX-2.3-Foley-LoRA LoRA. The LoRA adds Foley sound effects to a silent input video using LTX-2.3: given a video and a prompt describing the visible action, the loop workflow generates matching non-speech, non-music sound effects and saves a new MP4.

There are two workflows provided:

  1. foley-sliding-window.json: long-video workflow with overlapping audio windows and stitching.
  2. ltx_23_foley_v2a.json: original short-clip workflow.

If you want run a quick short test, use ltx_23_foley_v2a.json. Otherwise, use foley-sliding-window.json so you can generate longer audio while keeping memory under control.

Tutorial

Watch the tutorial: using the LTX-2.3 Foley LoRA in ComfyUI

Watch the tutorial on YouTube

What Is Included

  • foley-sliding-window.json: long-video workflow with overlapping audio windows and stitching.
  • ltx_23_foley_v2a.json: original short-clip workflow.
  • setup_runpod_ltx_foley.sh: one-command RunPod setup script.
  • ltx_foley_v2a: small helper-node package.
  • tennis-no-sound.mp4: default silent test video for RunPod setup.

Both workflows require the ltx_foley_v2a helper-node package. If ComfyUI shows missing nodes named LTXFoleyForLoopOpen, LTXFoleyWindowSelect, LTXFoleyVideoToAudioLatent, or LTXFoleyAudioVAEDecode, the workflow JSON was loaded before these helper nodes were installed into ComfyUI/custom_nodes.

The helper-node package handles the workflow-specific pieces that stock ComfyUI does not currently cover cleanly:

  • plans the window count from the uploaded video
  • provides a small local ComfyUI for-loop so no external loop-node pack is needed
  • splits longer videos into overlapping windows
  • freezes each source window as LTX video latents while leaving matching audio latents empty for Foley generation
  • decodes each audio window into the Comfy audio tensor layout expected by current video saving nodes
  • writes each raw decoded window as a WAV before stitching so artifacts can be checked before the final crossfade
  • crossfades and stitches generated audio windows into one final track

Prompt text, model loading, LoRA loading, video creation, and MP4 saving use normal ComfyUI/LTXVideo nodes.

Fastest RunPod Test

Use the official RunPod ComfyUI - CUDA 12.8 template:

https://console.runpod.io/deploy?template=cw3nka7d08&ref=k7b1cgii

  1. In RunPod, under "Additional Filters" filter CUDA versions to CUDA 12.8.
  2. Select a 48 GB GPU: A40, RTX A6000, L40/L40S, or A100.
  3. Make sure the ComfyUI - CUDA 12.8 template is selected.
  4. The template's default volume disk is 50 GB, which is enough for the core workflow files, but tight once caches and reruns accumulate. Change the volume disk to 100 GB if you want more breathing room.
  5. Start the pod and open a terminal.
  6. Run:
cd /workspace
curl -L https://huggingface.co/FuzzPuppy/LTX-2.3-Foley-Workflow/resolve/main/setup_runpod_ltx_foley.sh -o setup_runpod_ltx_foley.sh
bash setup_runpod_ltx_foley.sh

The setup script installs the nodes and models, downloads the tennis test video as input.mp4, restarts ComfyUI without stopping the pod (with --cache-classic, see the Manual ComfyUI Install notes), and waits until the UI responds on port 8188.

By default the script installs ComfyUI v0.27.0. To test another ComfyUI release, set COMFYUI_CORE_REF:

COMFYUI_CORE_REF=v0.19.0 bash setup_runpod_ltx_foley.sh

To install workflow files from a different Hugging Face branch, set WORKFLOW_REVISION:

WORKFLOW_REVISION=windows bash setup_runpod_ltx_foley.sh

After the script finishes:

  1. Open ComfyUI from the RunPod web UI.
  2. Under workflows, select foley-sliding-window.json.
  3. Hit Run.

The default input video and prompt are already set:

Two men are playing tennis. No speech is present. No music is present.

What The Script Installs

The script assumes the official CUDA 12.8 template layout from runpod-workers/comfyui-base:

  • ComfyUI: /workspace/runpod-slim/ComfyUI
  • Python environment: /workspace/runpod-slim/ComfyUI/.venv-cu128
  • ComfyUI port: 8188

It installs or refreshes:

  • Lightricks/ComfyUI-LTXVideo
  • ltx_foley_v2a helper nodes
  • foley-sliding-window.json
  • ltx_23_foley_v2a.json
  • tennis-no-sound.mp4

The script also applies a small compatibility patch to the installed ComfyUI-LTXVideo/pyramid_blending.py file so current Kornia builds can import the node pack on fresh ComfyUI installs.

It downloads these model files:

Large model downloads are SHA-256 verified. Completed files are skipped on rerun, interrupted downloads resume from *.part files, and corrupt partials are retried once from scratch.

Manual ComfyUI Install

If you are not using the RunPod script:

  1. Install or update ComfyUI.

  2. Install the official LTXVideo custom nodes: https://github.com/Lightricks/ComfyUI-LTXVideo

  3. Install the Foley helper nodes by placing the workflow repo's ltx_foley_v2a folder into: ComfyUI/custom_nodes/

  4. Copy the either foley-sliding-window.json or ltx_23_foley_v2a.json into your ComfyUI user workflows folder. In a standard ComfyUI install this is: ComfyUI/user/default/workflows.

  5. Put the model files in:

  6. Restart ComfyUI, starting it with the --cache-classic flag:

    python main.py --cache-classic
    

    On newer ComfyUI versions (v0.27.0+) the default caching mode is RAM-pressure caching, which can evict node outputs in the middle of a run while the large LTX models load. For foley-sliding-window.json that forces the window plan, video decode, and model loaders to re-execute between windows, making long runs much slower. --cache-classic keeps those outputs cached for the whole run. The flag also exists on older releases such as v0.19.0, where it is harmless.

  7. Under workflows, select foley-sliding-window.json or ltx_23_foley_v2a.json.

  8. Hit Run.

Workflow Defaults

  • Input video: input.mp4
  • Prompt: Two men are playing tennis. No speech is present. No music is present.
  • Negative prompt: anti-music/anti-vocal prompt
  • Conditioning size: 576x576
  • Frame window: 89 frames
  • Window overlap: 1.0 second
  • Maximum windows: 16
  • Random ID: 42
  • Sampling steps: 30
  • Guidance: 4.0
  • Save window audio: true
  • Window audio prefix: ltx_foley_window
  • LoRA strength: 1.0

Advanced sampler/STG settings are visible nodes in the loop body: sampler euler_ancestral_cfg_pp, STG scale 1.0, rescale 0.7, STG blocks 14, 19, max shift 2.05, base shift 0.95, terminal 0.1.

The foley-sliding-window.json workflow uses the full uploaded video. Videos longer than the selected window are processed as overlapping windows and stitched into one generated audio track. Shorter videos are padded internally by repeating the last frame. The saved MP4 uses the source frames plus the stitched generated audio. Raw generated window WAVs are saved under ComfyUI's output directory in ltx_foley_windows/ and their paths are listed in the manifest output.

VRAM Notes

Sampling is the VRAM peak. If you need to reduce memory use, try these changes in order:

  • reduce frames from 89 to 57, 41, or 25
  • reduce conditioning size from 576x576 to 448x448 or 384x384
  • reduce sampling steps from 30 to 20

Frame counts should stay one more than a multiple of 8:

9, 17, 25, 33, 41, 49, 57, ..., 89, ..., 257

For lfoley-sliding-window.json, the default max_windows is 16 so accidental very long inputs fail clearly instead of running for hours. Increase it only when you expect the extra runtime.

Troubleshooting

Missing Nodes

If ComfyUI reports missing LTXFoley... nodes after manual setup, verify that these files exist and then restart ComfyUI:

ComfyUI/custom_nodes/ltx_foley_v2a/__init__.py
ComfyUI/custom_nodes/ltx_foley_v2a/nodes.py

Models Reload Or Nodes Re-Execute Between Windows

If the log shows planned N windows repeating, or the checkpoint/text-encoder reloading before every window of foley-sliding-window.json, ComfyUI is running with its default RAM-pressure caching and is evicting node outputs mid-run. Start ComfyUI with --cache-classic (the RunPod script already does this). The generated audio is still correct either way — the re-execution only costs time.

Duplicate Sounds At Window Boundaries

In foley-sliding-window.json, neighboring windows overlap (default 1.0 second) and each window generates its audio independently. If a distinct sound event (a door close, a footstep) falls inside an overlap region, both windows may render it slightly out of alignment, and you can hear the event twice around a window boundary. The run log's planned N windows starts=[...] line shows where the boundaries are (start_frame / fps seconds).

If you hear this, reduce the Window overlap (overlap_seconds on the window-plan node), for example from 1.0 to 0.5. A smaller overlap makes it less likely an event lands in the shared region, at the cost of a shorter crossfade between windows. Avoid large overlaps: the bigger the overlap, the more of the video is generated twice, which increases the chance of doubled sounds.

Audio Artifacts On Some ComfyUI Versions

The workflows have been tested on ComfyUI v0.27.0 and run successfully there. However, on v0.27.0 and newer ComfyUI versions generally, we have noticed that LTX-2.3 video-to-audio can produce a high-pitched squeak or audio artifacts in some generated audio.

If you notice the audio artifacts on a generation, rollback to v0.19.0 of ComfyUI.

If you are using the RunPod setup you can rollback by simply:

cd /workspace
COMFYUI_CORE_REF=v0.19.0 bash setup_runpod_ltx_foley.sh

Then reload foley-sliding-window.json and run it again.

RunPod Setup

Restarting/Rerun

If you rerun setup after a workflow or node update:

cd /workspace && bash setup_runpod_ltx_foley.sh

The script will skip verified model files, refresh the workflow/helper nodes, and restart ComfyUI.

Model Downloads

If model downloads fail with authorization errors, accept the relevant Hugging Face model terms and rerun with HF_TOKEN set.

Logs

Logs from the script-managed ComfyUI restart are written to:

/workspace/runpod-slim/comfyui-restart.log

License Scope

The files in this workflow repository are released under the Apache-2.0 license. That applies to the workflow JSON, setup script, helper-node code, README/model card text, and bundled test assets in this repository.

This workflow downloads and uses third-party model files that are governed by their own licenses and terms, including LTX-2.3, the Gemma text encoder, and the FuzzPuppy/LTX-2.3-Foley-LoRA weights.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support