PencilFolder / README_zh.md

PencilHu

Upload folder using huggingface_hub

1146a67 verified 29 days ago

preview code

raw

history blame contribute delete

77 kB

DiffSynth-Studio

Switch to English

简介

欢迎来到 Diffusion 模型的魔法世界！DiffSynth-Studio 是由魔搭社区团队开发和维护的开源 Diffusion 模型引擎。我们期望以框架建设孵化技术创新，凝聚开源社区的力量，探索生成式模型技术的边界！

DiffSynth 目前包括两个开源项目：

DiffSynth-Studio: 聚焦于激进的技术探索，面向学术界，提供更前沿的模型能力支持。
DiffSynth-Engine: 聚焦于稳定的模型部署，面向工业界，提供更高的计算性能与更稳定的功能。

DiffSynth-Studio 与 DiffSynth-Engine 是魔搭社区 AIGC 专区的核心引擎，欢迎体验我们精心打造的产品化功能：

魔搭社区 AIGC 专区 (面向中国用户): https://modelscope.cn/aigc/home
ModelScope Civision (for global users): https://modelscope.ai/civision/home

DiffSynth-Studio 文档：中文版、English version

我们相信，一个完善的开源代码框架能够降低技术探索的门槛，我们基于这个代码库搞出了不少有意思的技术。或许你也有许多天马行空的构想，借助 DiffSynth-Studio，你可以快速实现这些想法。为此，我们为开发者准备了详细的文档，我们希望通过这些文档，帮助开发者理解 Diffusion 模型的原理，更期待与你一同拓展技术的边界。

更新历史

DiffSynth-Studio 经历了大版本更新，部分旧功能已停止维护，如需使用旧版功能，请切换到大版本更新前的最后一个历史版本。

目前本项目的开发人员有限，大部分工作由 Artiprocher 负责，因此新功能的开发进展会比较缓慢，issue 的回复和解决速度有限，我们对此感到非常抱歉，请各位开发者理解。

2025年12月9日 我们基于 DiffSynth-Studio 2.0 训练了一个疯狂的模型：Qwen-Image-i2L（Image to LoRA）。这一模型以图像为输入，以 LoRA 为输出。尽管这个版本的模型在泛化能力、细节保持能力等方面还有很大改进空间，我们将这些模型开源，以启发更多创新性的研究工作。
2025年12月4日 DiffSynth-Studio 2.0 发布！众多新功能上线
- 文档上线：我们的文档还在持续优化更新中
- 显存管理模块升级，支持 Layer 级别的 Disk Offload，同时释放内存与显存
- 新模型支持
  - Z-Image Turbo: 模型、文档、代码
  - FLUX.2-dev: 模型、文档、代码
- 训练框架升级
  - 拆分训练：支持自动化地将训练过程拆分为数据处理和训练两阶段（即使训练的是 ControlNet 或其他任意模型），在数据处理阶段进行文本编码、VAE 编码等不需要梯度回传的计算，在训练阶段处理其他计算。速度更快，显存需求更少。
  - 差分 LoRA 训练：这是我们曾在 ArtAug 中使用的训练技术，目前已可用于任意模型的 LoRA 训练。
  - FP8 训练：FP8 在训练中支持应用到任意非训练模型，即梯度关闭或者梯度仅影响 LoRA 权重的模型。

2025年11月4日 支持了 ByteDance/Video-As-Prompt-Wan2.1-14B 模型，该模型基于 Wan 2.1 训练，支持根据参考视频生成相应的动作。
2025年10月30日 支持了 meituan-longcat/LongCat-Video 模型，该模型支持文生视频、图生视频、视频续写。这个模型在本项目中沿用 Wan 的框架进行推理和训练。
2025年10月27日 支持了 krea/krea-realtime-video 模型，Wan 模型生态再添一员。
2025年9月23日 DiffSynth-Studio/Qwen-Image-EliGen-Poster 发布！本模型由我们与淘天体验设计团队联合研发并开源。模型基于 Qwen-Image 构建，专为电商海报场景设计，支持精确的分区布局控制。请参考我们的示例代码。
2025年9月9日 我们的训练框架支持了多种训练模式，目前已适配 Qwen-Image，除标准 SFT 训练模式外，已支持 Direct Distill，请参考我们的示例代码。这项功能是实验性的，我们将会继续完善已支持更全面的模型训练功能。
2025年8月28日 我们支持了Wan2.2-S2V，一个音频驱动的电影级视频生成模型。请参见./examples/wanvideo/。
2025年8月21日 DiffSynth-Studio/Qwen-Image-EliGen-V2 发布！相比于 V1 版本，训练数据集变为 Qwen-Image-Self-Generated-Dataset，因此，生成的图像更符合 Qwen-Image 本身的图像分布和风格。请参考我们的示例代码。
2025年8月21日 我们开源了 DiffSynth-Studio/Qwen-Image-In-Context-Control-Union 结构控制 LoRA 模型，采用 In Context 的技术路线，支持多种类别的结构控制条件，包括 canny, depth, lineart, softedge, normal, openpose。请参考我们的示例代码。
2025年8月20日 我们开源了 DiffSynth-Studio/Qwen-Image-Edit-Lowres-Fix 模型，提升了 Qwen-Image-Edit 对低分辨率图像输入的编辑效果。请参考我们的示例代码
2025年8月19日 🔥 Qwen-Image-Edit 开源，欢迎图像编辑模型新成员！
2025年8月18日 我们训练并开源了 Qwen-Image 的图像重绘 ControlNet 模型 DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Inpaint，模型结构采用了轻量化的设计，请参考我们的示例代码。
2025年8月15日 我们开源了 Qwen-Image-Self-Generated-Dataset 数据集。这是一个使用 Qwen-Image 模型生成的图像数据集，共包含 160,000 张1024 x 1024图像。它包括通用、英文文本渲染和中文文本渲染子集。我们为每张图像提供了图像描述、实体和结构控制图像的标注。开发者可以使用这个数据集来训练 Qwen-Image 模型的 ControlNet 和 EliGen 等模型，我们旨在通过开源推动技术发展！
2025年8月13日 我们训练并开源了 Qwen-Image 的 ControlNet 模型 DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Depth，模型结构采用了轻量化的设计，请参考我们的示例代码。
2025年8月12日 我们训练并开源了 Qwen-Image 的 ControlNet 模型 DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny，模型结构采用了轻量化的设计，请参考我们的示例代码。
2025年8月11日 我们开源了 Qwen-Image 的蒸馏加速模型 DiffSynth-Studio/Qwen-Image-Distill-LoRA，沿用了与 DiffSynth-Studio/Qwen-Image-Distill-Full 相同的训练流程，但模型结构修改为了 LoRA，因此能够更好地与其他开源生态模型兼容。
2025年8月7日 我们开源了 Qwen-Image 的实体控制 LoRA 模型 DiffSynth-Studio/Qwen-Image-EliGen。Qwen-Image-EliGen 能够实现实体级可控的文生图。技术细节请参见论文。训练数据集：EliGenTrainSet。
2025年8月5日 我们开源了 Qwen-Image 的蒸馏加速模型 DiffSynth-Studio/Qwen-Image-Distill-Full，实现了约 5 倍加速。
2025年8月4日 🔥 Qwen-Image 开源，欢迎图像生成模型家族新成员！
2025年8月1日 FLUX.1-Krea-dev 开源，这是一个专注于美学摄影的文生图模型。我们第一时间提供了全方位支持，包括低显存逐层 offload、LoRA 训练、全量训练。详细信息请参考 ./examples/flux/。
2025年7月28日 Wan 2.2 开源，我们第一时间提供了全方位支持，包括低显存逐层 offload、FP8 量化、序列并行、LoRA 训练、全量训练。详细信息请参考 ./examples/wanvideo/。
2025年7月11日 我们提出 Nexus-Gen，一个将大语言模型（LLM）的语言推理能力与扩散模型的图像生成能力相结合的统一框架。该框架支持无缝的图像理解、生成和编辑任务。
- 论文: Nexus-Gen: Unified Image Understanding, Generation, and Editing via Prefilled Autoregression in Shared Embedding Space
- Github 仓库: https://github.com/modelscope/Nexus-Gen
- 模型: ModelScope, HuggingFace
- 训练数据集: ModelScope Dataset
- 在线体验: ModelScope Nexus-Gen Studio
2025年6月15日 ModelScope 官方评测框架 EvalScope 现已支持文生图生成评测。请参考最佳实践指南进行尝试。
2025年3月25日 我们的新开源项目 DiffSynth-Engine 现已开源！专注于稳定的模型部署，面向工业界，提供更好的工程支持、更高的计算性能和更稳定的功能。
2025年3月31日 我们支持 InfiniteYou，一种用于 FLUX 的人脸特征保留方法。更多细节请参考 ./examples/InfiniteYou/。
2025年3月13日 我们支持 HunyuanVideo-I2V，即腾讯开源的 HunyuanVideo 的图像到视频生成版本。更多细节请参考 ./examples/HunyuanVideo/。
2025年2月25日 我们支持 Wan-Video，这是阿里巴巴开源的一系列最先进的视频合成模型。详见 ./examples/wanvideo/。
2025年2月17日 我们支持 StepVideo！先进的视频合成模型！详见 ./examples/stepvideo。
2024年12月31日 我们提出 EliGen，一种用于精确实体级别控制的文本到图像生成的新框架，并辅以修复融合管道，将其能力扩展到图像修复任务。EliGen 可以无缝集成现有的社区模型，如 IP-Adapter 和 In-Context LoRA，提升其通用性。更多详情，请见 ./examples/EntityControl。
- 论文: EliGen: Entity-Level Controlled Image Generation with Regional Attention
- 模型: ModelScope, HuggingFace
- 在线体验: ModelScope EliGen Studio
- 训练数据集: EliGen Train Set
2024年12月19日 我们为 HunyuanVideo 实现了高级显存管理，使得在 24GB 显存下可以生成分辨率为 129x720x1280 的视频，或在仅 6GB 显存下生成分辨率为 129x512x384 的视频。更多细节请参考 ./examples/HunyuanVideo/。
2024年12月18日 我们提出 ArtAug，一种通过合成-理解交互来改进文生图模型的方法。我们以 LoRA 格式为 FLUX.1-dev 训练了一个 ArtAug 增强模块。该模型将 Qwen2-VL-72B 的美学理解融入 FLUX.1-dev，从而提升了生成图像的质量。
- 论文: https://arxiv.org/abs/2412.12888
- 示例: https://github.com/modelscope/DiffSynth-Studio/tree/main/examples/ArtAug
- 模型: ModelScope, HuggingFace
- 演示: ModelScope, HuggingFace (即将上线)
2024年10月25日 我们提供了广泛的 FLUX ControlNet 支持。该项目支持许多不同的 ControlNet 模型，并且可以自由组合，即使它们的结构不同。此外，ControlNet 模型兼容高分辨率优化和分区控制技术，能够实现非常强大的可控图像生成。详见 ./examples/ControlNet/。
2024年10月8日 我们发布了基于 CogVideoX-5B 和 ExVideo 的扩展 LoRA。您可以从 ModelScope 或 HuggingFace 下载此模型。
2024年8月22日 本项目现已支持 CogVideoX-5B。详见此处。我们为这个文生视频模型提供了几个有趣的功能，包括：
- 文本到视频
- 视频编辑
- 自我超分
- 视频插帧
2024年8月22日 我们实现了一个有趣的画笔功能，支持所有文生图模型。现在，您可以在 AI 的辅助下使用画笔创作惊艳的图像了！
- 在我们的 WebUI 中使用它。
2024年8月21日 DiffSynth-Studio 现已支持 FLUX。
- 启用 CFG 和高分辨率修复以提升视觉质量。详见此处
- LoRA、ControlNet 和其他附加模型将很快推出。
2024年6月21日 我们提出 ExVideo，一种旨在增强视频生成模型能力的后训练微调技术。我们将 Stable Video Diffusion 进行了扩展，实现了长达 128 帧的长视频生成。
- 项目页面
- 源代码已在此仓库中发布。详见 examples/ExVideo。
- 模型已发布于 HuggingFace 和 ModelScope。
- 技术报告已发布于 arXiv。
- 您可以在此演示中试用 ExVideo！
2024年6月13日 DiffSynth Studio 已迁移至 ModelScope。开发团队也从“我”转变为“我们”。当然，我仍会参与后续的开发和维护工作。
2024年1月29日 我们提出 Diffutoon，这是一个出色的卡通着色解决方案。
- 项目页面
- 源代码已在此项目中发布。
- 技术报告（IJCAI 2024）已发布于 arXiv。
2023年12月8日 我们决定启动一个新项目，旨在释放扩散模型的潜力，尤其是在视频合成方面。该项目的开发工作正式开始。
2023年11月15日 我们提出 FastBlend，一种强大的视频去闪烁算法。
- sd-webui 扩展已发布于 GitHub。
- 演示视频已在 Bilibili 上展示，包含三个任务：
- 技术报告已发布于 arXiv。
- 其他用户开发的非官方 ComfyUI 扩展已发布于 GitHub。
2023年10月1日 我们发布了该项目的早期版本，名为 FastSDXL。这是构建一个扩散引擎的初步尝试。
- 源代码已发布于 GitHub。
- FastSDXL 包含一个可训练的 OLSS 调度器，以提高效率。
  - OLSS 的原始仓库位于此处。
  - 技术报告（CIKM 2023）已发布于 arXiv。
  - 演示视频已发布于 Bilibili。
  - 由于 OLSS 需要额外训练，我们未在本项目中实现它。
2023年8月29日 我们提出 DiffSynth，一个视频合成框架。
- 项目页面。
- 源代码已发布在 EasyNLP。
- 技术报告（ECML PKDD 2024）已发布于 arXiv。

安装

从源码安装（推荐）：

git clone https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -e .

其他安装方式

从 pypi 安装（存在版本更新延迟，如需使用最新功能，请从源码安装）

pip install diffsynth

如果在安装过程中遇到问题，可能是由上游依赖包导致的，请参考这些包的文档：

基础框架

DiffSynth-Studio 为主流 Diffusion 模型（包括 FLUX、Wan 等）重新设计了推理和训练流水线，能够实现高效的显存管理、灵活的模型训练。

环境变量配置

在进行模型推理和训练前，可通过环境变量配置模型下载源等。

本项目默认从魔搭社区下载模型。对于非中国区域的用户，可以通过以下配置从魔搭社区的国际站下载模型：
import os
os.environ["MODELSCOPE_DOMAIN"] = "www.modelscope.ai"
如需从其他站点下载，请修改环境变量 DIFFSYNTH_DOWNLOAD_SOURCE。

图像生成模型

Z-Image：/docs/zh/Model_Details/Z-Image.md

快速开始

运行以下代码可以快速加载 Tongyi-MAI/Z-Image-Turbo 模型并进行推理。FP8 精度量化会导致明显的图像质量劣化，因此不建议在 Z-Image Turbo 模型上开启任何量化，仅建议开启 CPU Offload，最低 8G 显存即可运行。

from diffsynth.pipelines.z_image import ZImagePipeline, ModelConfig
import torch

vram_config = {
    "offload_dtype": torch.bfloat16,
    "offload_device": "cpu",
    "onload_dtype": torch.bfloat16,
    "onload_device": "cpu",
    "preparing_dtype": torch.bfloat16,
    "preparing_device": "cuda",
    "computation_dtype": torch.bfloat16,
    "computation_device": "cuda",
}
pipe = ZImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="transformer/*.safetensors", **vram_config),
        ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="text_encoder/*.safetensors", **vram_config),
        ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="vae/diffusion_pytorch_model.safetensors", **vram_config),
    ],
    tokenizer_config=ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="tokenizer/"),
    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
)
prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights."
image = pipe(prompt=prompt, seed=42, rand_device="cuda")
image.save("image.jpg")

示例代码

Z-Image 的示例代码位于：/examples/z_image/

模型 ID	推理	低显存推理	全量训练	全量训练后验证	LoRA 训练	LoRA 训练后验证
Tongyi-MAI/Z-Image-Turbo	code	code	code	code	code	code

FLUX.2: /docs/zh/Model_Details/FLUX2.md

快速开始

运行以下代码可以快速加载 black-forest-labs/FLUX.2-dev 模型并进行推理。显存管理已启动，框架会自动根据剩余显存控制模型参数的加载，最低 10G 显存即可运行。

from diffsynth.pipelines.flux2_image import Flux2ImagePipeline, ModelConfig
import torch

vram_config = {
    "offload_dtype": "disk",
    "offload_device": "disk",
    "onload_dtype": torch.float8_e4m3fn,
    "onload_device": "cpu",
    "preparing_dtype": torch.float8_e4m3fn,
    "preparing_device": "cuda",
    "computation_dtype": torch.bfloat16,
    "computation_device": "cuda",
}
pipe = Flux2ImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="black-forest-labs/FLUX.2-dev", origin_file_pattern="text_encoder/*.safetensors", **vram_config),
        ModelConfig(model_id="black-forest-labs/FLUX.2-dev", origin_file_pattern="transformer/*.safetensors", **vram_config),
        ModelConfig(model_id="black-forest-labs/FLUX.2-dev", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
    ],
    tokenizer_config=ModelConfig(model_id="black-forest-labs/FLUX.2-dev", origin_file_pattern="tokenizer/"),
    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
)
prompt = "High resolution. A dreamy underwater portrait of a serene young woman in a flowing blue dress. Her hair floats softly around her face, strands delicately suspended in the water. Clear, shimmering light filters through, casting gentle highlights, while tiny bubbles rise around her. Her expression is calm, her features finely detailed—creating a tranquil, ethereal scene."
image = pipe(prompt, seed=42, rand_device="cuda", num_inference_steps=50)
image.save("image.jpg")

示例代码

FLUX.2 的示例代码位于：/examples/flux2/

模型 ID	推理	低显存推理	LoRA 训练	LoRA 训练后验证
black-forest-labs/FLUX.2-dev	code	code	code	code

Qwen-Image: /docs/zh/Model_Details/Qwen-Image.md

快速开始

运行以下代码可以快速加载 Qwen/Qwen-Image 模型并进行推理。显存管理已启动，框架会自动根据剩余显存控制模型参数的加载，最低 8G 显存即可运行。

from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
import torch

vram_config = {
    "offload_dtype": "disk",
    "offload_device": "disk",
    "onload_dtype": torch.float8_e4m3fn,
    "onload_device": "cpu",
    "preparing_dtype": torch.float8_e4m3fn,
    "preparing_device": "cuda",
    "computation_dtype": torch.bfloat16,
    "computation_device": "cuda",
}
pipe = QwenImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors", **vram_config),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors", **vram_config),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors", **vram_config),
    ],
    tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5,
)
prompt = "精致肖像，水下少女，蓝裙飘逸，发丝轻扬，光影透澈，气泡环绕，面容恬静，细节精致，梦幻唯美。"
image = pipe(prompt, seed=0, num_inference_steps=40)
image.save("image.jpg")

模型血缘

graph LR;
    Qwen/Qwen-Image-->Qwen/Qwen-Image-Edit;
    Qwen/Qwen-Image-Edit-->Qwen/Qwen-Image-Edit-2509;
    Qwen/Qwen-Image-->EliGen-Series;
    EliGen-Series-->DiffSynth-Studio/Qwen-Image-EliGen;
    DiffSynth-Studio/Qwen-Image-EliGen-->DiffSynth-Studio/Qwen-Image-EliGen-V2;
    EliGen-Series-->DiffSynth-Studio/Qwen-Image-EliGen-Poster;
    Qwen/Qwen-Image-->Distill-Series;
    Distill-Series-->DiffSynth-Studio/Qwen-Image-Distill-Full;
    Distill-Series-->DiffSynth-Studio/Qwen-Image-Distill-LoRA;
    Qwen/Qwen-Image-->ControlNet-Series;
    ControlNet-Series-->Blockwise-ControlNet-Series;
    Blockwise-ControlNet-Series-->DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny;
    Blockwise-ControlNet-Series-->DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Depth;
    Blockwise-ControlNet-Series-->DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Inpaint;
    ControlNet-Series-->DiffSynth-Studio/Qwen-Image-In-Context-Control-Union;
    Qwen/Qwen-Image-->DiffSynth-Studio/Qwen-Image-Edit-Lowres-Fix;

示例代码

Qwen-Image 的示例代码位于：/examples/qwen_image/

模型 ID	推理	低显存推理	全量训练	全量训练后验证	LoRA 训练	LoRA 训练后验证
Qwen/Qwen-Image	code	code	code	code	code	code
Qwen/Qwen-Image-Edit	code	code	code	code	code	code
Qwen/Qwen-Image-Edit-2509	code	code	code	code	code	code
DiffSynth-Studio/Qwen-Image-EliGen	code	code	-	-	code	code
DiffSynth-Studio/Qwen-Image-EliGen-V2	code	code	-	-	code	code
DiffSynth-Studio/Qwen-Image-EliGen-Poster	code	code	-	-	code	code
DiffSynth-Studio/Qwen-Image-Distill-Full	code	code	code	code	code	code
DiffSynth-Studio/Qwen-Image-Distill-LoRA	code	code	-	-	code	code
DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny	code	code	code	code	code	code
DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Depth	code	code	code	code	code	code
DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Inpaint	code	code	code	code	code	code
DiffSynth-Studio/Qwen-Image-In-Context-Control-Union	code	code	-	-	code	code
DiffSynth-Studio/Qwen-Image-Edit-Lowres-Fix	code	code	-	-	-	-
DiffSynth-Studio/Qwen-Image-i2L	code	code	-	-	-	-

FLUX.1: /docs/zh/Model_Details/FLUX.md

快速开始

运行以下代码可以快速加载 black-forest-labs/FLUX.1-dev 模型并进行推理。显存管理已启动，框架会自动根据剩余显存控制模型参数的加载，最低 8G 显存即可运行。

import torch
from diffsynth.pipelines.flux_image import FluxImagePipeline, ModelConfig

vram_config = {
    "offload_dtype": torch.float8_e4m3fn,
    "offload_device": "cpu",
    "onload_dtype": torch.float8_e4m3fn,
    "onload_device": "cpu",
    "preparing_dtype": torch.float8_e4m3fn,
    "preparing_device": "cuda",
    "computation_dtype": torch.bfloat16,
    "computation_device": "cuda",
}
pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors", **vram_config),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors", **vram_config),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/*.safetensors", **vram_config),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors", **vram_config),
    ],
    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 1,
)
prompt = "CG, masterpiece, best quality, solo, long hair, wavy hair, silver hair, blue eyes, blue dress, medium breasts, dress, underwater, air bubble, floating hair, refraction, portrait. The girl's flowing silver hair shimmers with every color of the rainbow and cascades down, merging with the floating flora around her."
image = pipe(prompt=prompt, seed=0)
image.save("image.jpg")

模型血缘

graph LR;
    FLUX.1-Series-->black-forest-labs/FLUX.1-dev;
    FLUX.1-Series-->black-forest-labs/FLUX.1-Krea-dev;
    FLUX.1-Series-->black-forest-labs/FLUX.1-Kontext-dev;
    black-forest-labs/FLUX.1-dev-->FLUX.1-dev-ControlNet-Series;
    FLUX.1-dev-ControlNet-Series-->alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta;
    FLUX.1-dev-ControlNet-Series-->InstantX/FLUX.1-dev-Controlnet-Union-alpha;
    FLUX.1-dev-ControlNet-Series-->jasperai/Flux.1-dev-Controlnet-Upscaler;
    black-forest-labs/FLUX.1-dev-->InstantX/FLUX.1-dev-IP-Adapter;
    black-forest-labs/FLUX.1-dev-->ByteDance/InfiniteYou;
    black-forest-labs/FLUX.1-dev-->DiffSynth-Studio/Eligen;
    black-forest-labs/FLUX.1-dev-->DiffSynth-Studio/LoRA-Encoder-FLUX.1-Dev;
    black-forest-labs/FLUX.1-dev-->DiffSynth-Studio/LoRAFusion-preview-FLUX.1-dev;
    black-forest-labs/FLUX.1-dev-->ostris/Flex.2-preview;
    black-forest-labs/FLUX.1-dev-->stepfun-ai/Step1X-Edit;
    Qwen/Qwen2.5-VL-7B-Instruct-->stepfun-ai/Step1X-Edit;
    black-forest-labs/FLUX.1-dev-->DiffSynth-Studio/Nexus-GenV2;
    Qwen/Qwen2.5-VL-7B-Instruct-->DiffSynth-Studio/Nexus-GenV2;

示例代码

FLUX.1 的示例代码位于：/examples/flux/

模型 ID	额外参数	推理	低显存推理	全量训练	全量训练后验证	LoRA 训练	LoRA 训练后验证
black-forest-labs/FLUX.1-dev		code	code	code	code	code	code
black-forest-labs/FLUX.1-Krea-dev		code	code	code	code	code	code
black-forest-labs/FLUX.1-Kontext-dev	`kontext_images`	code	code	code	code	code	code
alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta	`controlnet_inputs`	code	code	code	code	code	code
InstantX/FLUX.1-dev-Controlnet-Union-alpha	`controlnet_inputs`	code	code	code	code	code	code
jasperai/Flux.1-dev-Controlnet-Upscaler	`controlnet_inputs`	code	code	code	code	code	code
InstantX/FLUX.1-dev-IP-Adapter	`ipadapter_images`, `ipadapter_scale`	code	code	code	code	code	code
ByteDance/InfiniteYou	`infinityou_id_image`, `infinityou_guidance`, `controlnet_inputs`	code	code	code	code	code	code
DiffSynth-Studio/Eligen	`eligen_entity_prompts`, `eligen_entity_masks`, `eligen_enable_on_negative`, `eligen_enable_inpaint`	code	code	-	-	code	code
DiffSynth-Studio/LoRA-Encoder-FLUX.1-Dev	`lora_encoder_inputs`, `lora_encoder_scale`	code	code	code	code	-	-
DiffSynth-Studio/LoRAFusion-preview-FLUX.1-dev		code	-	-	-	-	-
stepfun-ai/Step1X-Edit	`step1x_reference_image`	code	code	code	code	code	code
ostris/Flex.2-preview	`flex_inpaint_image`, `flex_inpaint_mask`, `flex_control_image`, `flex_control_strength`, `flex_control_stop`	code	code	code	code	code	code
DiffSynth-Studio/Nexus-GenV2	`nexus_gen_reference_image`	code	code	code	code	code	code

视频生成模型

https://github.com/user-attachments/assets/1d66ae74-3b02-40a9-acc3-ea95fc039314

Wan: /docs/zh/Model_Details/Wan.md

快速开始

运行以下代码可以快速加载 Wan-AI/Wan2.1-T2V-1.3B 模型并进行推理。显存管理已启动，框架会自动根据剩余显存控制模型参数的加载，最低 8G 显存即可运行。

import torch
from diffsynth.utils.data import save_video, VideoData
from diffsynth.pipelines.wan_video import WanVideoPipeline, ModelConfig

vram_config = {
    "offload_dtype": "disk",
    "offload_device": "disk",
    "onload_dtype": torch.bfloat16,
    "onload_device": "cpu",
    "preparing_dtype": torch.bfloat16,
    "preparing_device": "cuda",
    "computation_dtype": torch.bfloat16,
    "computation_device": "cuda",
}
pipe = WanVideoPipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="Wan-AI/Wan2.1-T2V-1.3B", origin_file_pattern="diffusion_pytorch_model*.safetensors", **vram_config),
        ModelConfig(model_id="Wan-AI/Wan2.1-T2V-1.3B", origin_file_pattern="models_t5_umt5-xxl-enc-bf16.pth", **vram_config),
        ModelConfig(model_id="Wan-AI/Wan2.1-T2V-1.3B", origin_file_pattern="Wan2.1_VAE.pth", **vram_config),
    ],
    tokenizer_config=ModelConfig(model_id="Wan-AI/Wan2.1-T2V-1.3B", origin_file_pattern="google/umt5-xxl/"),
    vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 2,
)

video = pipe(
    prompt="纪实摄影风格画面，一只活泼的小狗在绿茵茵的草地上迅速奔跑。小狗毛色棕黄，两只耳朵立起，神情专注而欢快。阳光洒在它身上，使得毛发看上去格外柔软而闪亮。背景是一片开阔的草地，偶尔点缀着几朵野花，远处隐约可见蓝天和几片白云。透视感鲜明，捕捉小狗奔跑时的动感和四周草地的生机。中景侧面移动视角。",
    negative_prompt="色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走",
    seed=0, tiled=True,
)
save_video(video, "video.mp4", fps=15, quality=5)

模型血缘

graph LR;
    Wan-Series-->Wan2.1-Series;
    Wan-Series-->Wan2.2-Series;
    Wan2.1-Series-->Wan-AI/Wan2.1-T2V-1.3B;
    Wan2.1-Series-->Wan-AI/Wan2.1-T2V-14B;
    Wan-AI/Wan2.1-T2V-14B-->Wan-AI/Wan2.1-I2V-14B-480P;
    Wan-AI/Wan2.1-I2V-14B-480P-->Wan-AI/Wan2.1-I2V-14B-720P;
    Wan-AI/Wan2.1-T2V-14B-->Wan-AI/Wan2.1-FLF2V-14B-720P;
    Wan-AI/Wan2.1-T2V-1.3B-->iic/VACE-Wan2.1-1.3B-Preview;
    iic/VACE-Wan2.1-1.3B-Preview-->Wan-AI/Wan2.1-VACE-1.3B;
    Wan-AI/Wan2.1-T2V-14B-->Wan-AI/Wan2.1-VACE-14B;
    Wan-AI/Wan2.1-T2V-1.3B-->Wan2.1-Fun-1.3B-Series;
    Wan2.1-Fun-1.3B-Series-->PAI/Wan2.1-Fun-1.3B-InP;
    Wan2.1-Fun-1.3B-Series-->PAI/Wan2.1-Fun-1.3B-Control;
    Wan-AI/Wan2.1-T2V-14B-->Wan2.1-Fun-14B-Series;
    Wan2.1-Fun-14B-Series-->PAI/Wan2.1-Fun-14B-InP;
    Wan2.1-Fun-14B-Series-->PAI/Wan2.1-Fun-14B-Control;
    Wan-AI/Wan2.1-T2V-1.3B-->Wan2.1-Fun-V1.1-1.3B-Series;
    Wan2.1-Fun-V1.1-1.3B-Series-->PAI/Wan2.1-Fun-V1.1-1.3B-Control;
    Wan2.1-Fun-V1.1-1.3B-Series-->PAI/Wan2.1-Fun-V1.1-1.3B-InP;
    Wan2.1-Fun-V1.1-1.3B-Series-->PAI/Wan2.1-Fun-V1.1-1.3B-Control-Camera;
    Wan-AI/Wan2.1-T2V-14B-->Wan2.1-Fun-V1.1-14B-Series;
    Wan2.1-Fun-V1.1-14B-Series-->PAI/Wan2.1-Fun-V1.1-14B-Control;
    Wan2.1-Fun-V1.1-14B-Series-->PAI/Wan2.1-Fun-V1.1-14B-InP;
    Wan2.1-Fun-V1.1-14B-Series-->PAI/Wan2.1-Fun-V1.1-14B-Control-Camera;
    Wan-AI/Wan2.1-T2V-1.3B-->DiffSynth-Studio/Wan2.1-1.3b-speedcontrol-v1;
    Wan-AI/Wan2.1-T2V-14B-->krea/krea-realtime-video;
    Wan-AI/Wan2.1-T2V-14B-->meituan-longcat/LongCat-Video;
    Wan-AI/Wan2.1-I2V-14B-720P-->ByteDance/Video-As-Prompt-Wan2.1-14B;
    Wan-AI/Wan2.1-T2V-14B-->Wan-AI/Wan2.2-Animate-14B;
    Wan-AI/Wan2.1-T2V-14B-->Wan-AI/Wan2.2-S2V-14B;
    Wan2.2-Series-->Wan-AI/Wan2.2-T2V-A14B;
    Wan2.2-Series-->Wan-AI/Wan2.2-I2V-A14B;
    Wan2.2-Series-->Wan-AI/Wan2.2-TI2V-5B;
    Wan-AI/Wan2.2-T2V-A14B-->Wan2.2-Fun-Series;
    Wan2.2-Fun-Series-->PAI/Wan2.2-VACE-Fun-A14B;
    Wan2.2-Fun-Series-->PAI/Wan2.2-Fun-A14B-InP;
    Wan2.2-Fun-Series-->PAI/Wan2.2-Fun-A14B-Control;
    Wan2.2-Fun-Series-->PAI/Wan2.2-Fun-A14B-Control-Camera;

示例代码

Wan 的示例代码位于：/examples/wanvideo/

模型 ID	额外参数	推理	全量训练	全量训练后验证	LoRA 训练	LoRA 训练后验证
Wan-AI/Wan2.1-T2V-1.3B		code	code	code	code	code
Wan-AI/Wan2.1-T2V-14B		code	code	code	code	code
Wan-AI/Wan2.1-I2V-14B-480P	`input_image`	code	code	code	code	code
Wan-AI/Wan2.1-I2V-14B-720P	`input_image`	code	code	code	code	code
Wan-AI/Wan2.1-FLF2V-14B-720P	`input_image`, `end_image`	code	code	code	code	code
iic/VACE-Wan2.1-1.3B-Preview	`vace_control_video`, `vace_reference_image`	code	code	code	code	code
Wan-AI/Wan2.1-VACE-1.3B	`vace_control_video`, `vace_reference_image`	code	code	code	code	code
Wan-AI/Wan2.1-VACE-14B	`vace_control_video`, `vace_reference_image`	code	code	code	code	code
PAI/Wan2.1-Fun-1.3B-InP	`input_image`, `end_image`	code	code	code	code	code
PAI/Wan2.1-Fun-1.3B-Control	`control_video`	code	code	code	code	code
PAI/Wan2.1-Fun-14B-InP	`input_image`, `end_image`	code	code	code	code	code
PAI/Wan2.1-Fun-14B-Control	`control_video`	code	code	code	code	code
PAI/Wan2.1-Fun-V1.1-1.3B-Control	`control_video`, `reference_image`	code	code	code	code	code
PAI/Wan2.1-Fun-V1.1-14B-Control	`control_video`, `reference_image`	code	code	code	code	code
PAI/Wan2.1-Fun-V1.1-1.3B-InP	`input_image`, `end_image`	code	code	code	code	code
PAI/Wan2.1-Fun-V1.1-14B-InP	`input_image`, `end_image`	code	code	code	code	code
PAI/Wan2.1-Fun-V1.1-1.3B-Control-Camera	`control_camera_video`, `input_image`	code	code	code	code	code
PAI/Wan2.1-Fun-V1.1-14B-Control-Camera	`control_camera_video`, `input_image`	code	code	code	code	code
DiffSynth-Studio/Wan2.1-1.3b-speedcontrol-v1	`motion_bucket_id`	code	code	code	code	code
krea/krea-realtime-video		code	code	code	code	code
meituan-longcat/LongCat-Video	`longcat_video`	code	code	code	code	code
ByteDance/Video-As-Prompt-Wan2.1-14B	`vap_video`, `vap_prompt`	code	code	code	code	code
Wan-AI/Wan2.2-T2V-A14B		code	code	code	code	code
Wan-AI/Wan2.2-I2V-A14B	`input_image`	code	code	code	code	code
Wan-AI/Wan2.2-TI2V-5B	`input_image`	code	code	code	code	code
Wan-AI/Wan2.2-Animate-14B	`input_image`, `animate_pose_video`, `animate_face_video`, `animate_inpaint_video`, `animate_mask_video`	code	code	code	code	code
Wan-AI/Wan2.2-S2V-14B	`input_image`, `input_audio`, `audio_sample_rate`, `s2v_pose_video`	code	code	code	code	code
PAI/Wan2.2-VACE-Fun-A14B	`vace_control_video`, `vace_reference_image`	code	code	code	code	code
PAI/Wan2.2-Fun-A14B-InP	`input_image`, `end_image`	code	code	code	code	code
PAI/Wan2.2-Fun-A14B-Control	`control_video`, `reference_image`	code	code	code	code	code
PAI/Wan2.2-Fun-A14B-Control-Camera	`control_camera_video`, `input_image`	code	code	code	code	code