LTX 2.3 Foley V2A ComfyUI Workflow
This repository contains ready-to-test ComfyUI workflows for the
FuzzPuppy/LTX-2.3-Foley-LoRA
LoRA. The LoRA adds Foley sound effects to a silent input video using LTX-2.3:
given a video and a prompt describing the visible action, the loop workflow
generates matching non-speech, non-music sound effects and saves a new MP4.
There are two workflows provided:
foley-sliding-window.json: long-video workflow with overlapping audio windows and stitching.ltx_23_foley_v2a.json: original short-clip workflow.
If you want run a quick short test, use ltx_23_foley_v2a.json. Otherwise, use foley-sliding-window.json so you can generate longer audio while keeping memory under control.
Tutorial
What Is Included
foley-sliding-window.json: long-video workflow with overlapping audio windows and stitching.ltx_23_foley_v2a.json: original short-clip workflow.setup_runpod_ltx_foley.sh: one-command RunPod setup script.ltx_foley_v2a: small helper-node package.tennis-no-sound.mp4: default silent test video for RunPod setup.
Both workflows require the ltx_foley_v2a helper-node package. If ComfyUI shows
missing nodes named LTXFoleyForLoopOpen, LTXFoleyWindowSelect,
LTXFoleyVideoToAudioLatent, or LTXFoleyAudioVAEDecode, the workflow JSON was
loaded before these helper nodes were installed into ComfyUI/custom_nodes.
The helper-node package handles the workflow-specific pieces that stock ComfyUI does not currently cover cleanly:
- plans the window count from the uploaded video
- provides a small local ComfyUI for-loop so no external loop-node pack is needed
- splits longer videos into overlapping windows
- freezes each source window as LTX video latents while leaving matching audio latents empty for Foley generation
- decodes each audio window into the Comfy audio tensor layout expected by current video saving nodes
- writes each raw decoded window as a WAV before stitching so artifacts can be checked before the final crossfade
- crossfades and stitches generated audio windows into one final track
Prompt text, model loading, LoRA loading, video creation, and MP4 saving use normal ComfyUI/LTXVideo nodes.
Fastest RunPod Test
Use the official RunPod ComfyUI - CUDA 12.8 template:
https://console.runpod.io/deploy?template=cw3nka7d08&ref=k7b1cgii
- In RunPod, under "Additional Filters" filter CUDA versions to CUDA 12.8.
- Select a 48 GB GPU: A40, RTX A6000, L40/L40S, or A100.
- Make sure the
ComfyUI - CUDA 12.8template is selected. - The template's default volume disk is
50 GB, which is enough for the core workflow files, but tight once caches and reruns accumulate. Change the volume disk to100 GBif you want more breathing room. - Start the pod and open a terminal.
- Run:
cd /workspace
curl -L https://huggingface.co/FuzzPuppy/LTX-2.3-Foley-Workflow/resolve/main/setup_runpod_ltx_foley.sh -o setup_runpod_ltx_foley.sh
bash setup_runpod_ltx_foley.sh
The setup script installs the nodes and models, downloads the tennis test video as input.mp4, restarts ComfyUI without stopping the pod (with --cache-classic, see the Manual ComfyUI Install notes), and waits until the UI responds on port 8188.
By default the script installs ComfyUI v0.27.0. To test another ComfyUI release, set COMFYUI_CORE_REF:
COMFYUI_CORE_REF=v0.19.0 bash setup_runpod_ltx_foley.sh
To install workflow files from a different Hugging Face branch, set
WORKFLOW_REVISION:
WORKFLOW_REVISION=windows bash setup_runpod_ltx_foley.sh
After the script finishes:
- Open ComfyUI from the RunPod web UI.
- Under workflows, select
foley-sliding-window.json. - Hit
Run.
The default input video and prompt are already set:
Two men are playing tennis. No speech is present. No music is present.
What The Script Installs
The script assumes the official CUDA 12.8 template layout from
runpod-workers/comfyui-base:
- ComfyUI:
/workspace/runpod-slim/ComfyUI - Python environment:
/workspace/runpod-slim/ComfyUI/.venv-cu128 - ComfyUI port:
8188
It installs or refreshes:
Lightricks/ComfyUI-LTXVideoltx_foley_v2ahelper nodesfoley-sliding-window.jsonltx_23_foley_v2a.jsontennis-no-sound.mp4
The script also applies a small compatibility patch to the installed
ComfyUI-LTXVideo/pyramid_blending.py file so current Kornia builds can import
the node pack on fresh ComfyUI installs.
It downloads these model files:
- Base checkpoint:
Lightricks/LTX-2.3-fp8/ltx-2.3-22b-dev-fp8.safetensors - Text encoder:
Comfy-Org/ltx-2/split_files/text_encoders/gemma_3_12B_it_fp8_scaled.safetensors - Foley LoRA:
FuzzPuppy/LTX-2.3-Foley-LoRA/ltx-2.3-foley-400-steps.safetensors
Large model downloads are SHA-256 verified. Completed files are skipped on
rerun, interrupted downloads resume from *.part files, and corrupt partials
are retried once from scratch.
Manual ComfyUI Install
If you are not using the RunPod script:
Install or update ComfyUI.
Install the official LTXVideo custom nodes:
https://github.com/Lightricks/ComfyUI-LTXVideoInstall the Foley helper nodes by placing the workflow repo's
ltx_foley_v2afolder into:ComfyUI/custom_nodes/Copy the either
foley-sliding-window.jsonorltx_23_foley_v2a.jsoninto your ComfyUI user workflows folder. In a standard ComfyUI install this is:ComfyUI/user/default/workflows.Put the model files in:
- checkpoint:
ltx-2.3-22b-dev-fp8.safetensorsinComfyUI/models/checkpoints - text encoder:
gemma_3_12B_it_fp8_scaled.safetensorsinComfyUI/models/text_encoders - Foley LoRA:
ltx-2.3-foley-400-steps.safetensorsinComfyUI/models/loras
- checkpoint:
Restart ComfyUI, starting it with the
--cache-classicflag:python main.py --cache-classicOn newer ComfyUI versions (
v0.27.0+) the default caching mode is RAM-pressure caching, which can evict node outputs in the middle of a run while the large LTX models load. Forfoley-sliding-window.jsonthat forces the window plan, video decode, and model loaders to re-execute between windows, making long runs much slower.--cache-classickeeps those outputs cached for the whole run. The flag also exists on older releases such asv0.19.0, where it is harmless.Under workflows, select
foley-sliding-window.jsonorltx_23_foley_v2a.json.Hit
Run.
Workflow Defaults
- Input video:
input.mp4 - Prompt:
Two men are playing tennis. No speech is present. No music is present. - Negative prompt: anti-music/anti-vocal prompt
- Conditioning size:
576x576 - Frame window:
89frames - Window overlap:
1.0second - Maximum windows:
16 - Random ID:
42 - Sampling steps:
30 - Guidance:
4.0 - Save window audio:
true - Window audio prefix:
ltx_foley_window - LoRA strength:
1.0
Advanced sampler/STG settings are visible nodes in the loop body:
sampler euler_ancestral_cfg_pp, STG scale 1.0, rescale 0.7, STG blocks
14, 19, max shift 2.05, base shift 0.95, terminal 0.1.
The foley-sliding-window.json workflow uses the full uploaded video. Videos longer than the selected
window are processed as overlapping windows and stitched into one generated
audio track. Shorter videos are padded internally by repeating the last frame.
The saved MP4 uses the source frames plus the stitched generated audio.
Raw generated window WAVs are saved under ComfyUI's output directory in
ltx_foley_windows/ and their paths are listed in the manifest output.
VRAM Notes
Sampling is the VRAM peak. If you need to reduce memory use, try these changes in order:
- reduce frames from
89to57,41, or25 - reduce conditioning size from
576x576to448x448or384x384 - reduce sampling steps from
30to20
Frame counts should stay one more than a multiple of 8:
9, 17, 25, 33, 41, 49, 57, ..., 89, ..., 257
For lfoley-sliding-window.json, the default max_windows is 16 so accidental very long inputs
fail clearly instead of running for hours. Increase it only when you expect the
extra runtime.
Troubleshooting
Missing Nodes
If ComfyUI reports missing LTXFoley... nodes after manual setup, verify that
these files exist and then restart ComfyUI:
ComfyUI/custom_nodes/ltx_foley_v2a/__init__.py
ComfyUI/custom_nodes/ltx_foley_v2a/nodes.py
Models Reload Or Nodes Re-Execute Between Windows
If the log shows planned N windows repeating, or the checkpoint/text-encoder
reloading before every window of foley-sliding-window.json, ComfyUI is running
with its default RAM-pressure caching and is evicting node outputs mid-run.
Start ComfyUI with --cache-classic (the RunPod script already does this). The
generated audio is still correct either way — the re-execution only costs time.
Duplicate Sounds At Window Boundaries
In foley-sliding-window.json, neighboring windows overlap (default 1.0
second) and each window generates its audio independently. If a distinct sound
event (a door close, a footstep) falls inside an overlap region, both windows
may render it slightly out of alignment, and you can hear the event twice
around a window boundary. The run log's planned N windows starts=[...] line
shows where the boundaries are (start_frame / fps seconds).
If you hear this, reduce the Window overlap (overlap_seconds on the
window-plan node), for example from 1.0 to 0.5. A smaller overlap makes it
less likely an event lands in the shared region, at the cost of a shorter
crossfade between windows. Avoid large overlaps: the bigger the overlap, the
more of the video is generated twice, which increases the chance of doubled
sounds.
Audio Artifacts On Some ComfyUI Versions
The workflows have been tested on ComfyUI v0.27.0 and run
successfully there. However, on v0.27.0 and newer ComfyUI versions generally, we have noticed that LTX-2.3 video-to-audio can produce a high-pitched squeak or audio artifacts in some generated audio.
If you notice the audio artifacts on a generation, rollback to v0.19.0 of ComfyUI.
If you are using the RunPod setup you can rollback by simply:
cd /workspace
COMFYUI_CORE_REF=v0.19.0 bash setup_runpod_ltx_foley.sh
Then reload foley-sliding-window.json and run it again.
RunPod Setup
Restarting/Rerun
If you rerun setup after a workflow or node update:
cd /workspace && bash setup_runpod_ltx_foley.sh
The script will skip verified model files, refresh the workflow/helper nodes, and restart ComfyUI.
Model Downloads
If model downloads fail with authorization errors, accept the relevant Hugging
Face model terms and rerun with HF_TOKEN set.
Logs
Logs from the script-managed ComfyUI restart are written to:
/workspace/runpod-slim/comfyui-restart.log
License Scope
The files in this workflow repository are released under the Apache-2.0 license. That applies to the workflow JSON, setup script, helper-node code, README/model card text, and bundled test assets in this repository.
This workflow downloads and uses third-party model files that are governed by
their own licenses and terms, including LTX-2.3, the Gemma text encoder, and the
FuzzPuppy/LTX-2.3-Foley-LoRA weights.
