How to convert a single safetensors file to PEFT format

I’m struggle in getting adapter_model.safetensors and adapter_config.json from this weighted LoRA file

1 Like

No conversion needed. Use it directly with Diffusers like this:

https://huggingface.co/lightx2v/Qwen-Image-Lightning :

pipe = DiffusionPipeline.from_pretrained(
    "Qwen/Qwen-Image", scheduler=scheduler, torch_dtype=torch.bfloat16
).to("cuda")
pipe.load_lora_weights(
    "lightx2v/Qwen-Image-Lightning", weight_name="Qwen-Image-Lightning-8steps-V1.0.safetensors"
)

thanks reply, but vLLM-Omni request use PEFT format LoRA:

1 Like

Oh…


Key point: that .safetensors is a Diffusers/ComfyUI LoRA, not a Transformers “PEFT adapter folder”

The file you linked (Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors) is published to be loaded directly via Diffusers (pipe.load_lora_weights(...)) on top of the base model Qwen/Qwen-Image, or used in ComfyUI. The repo’s model card shows exactly that usage pattern. (Hugging Face)

By contrast, a Transformers/PEFT adapter typically lives in a directory containing adapter_config.json + adapter_model.safetensors. (Hugging Face)
Those files are not “extractable” from an arbitrary LoRA .safetensors unless you (re)construct the adapter configuration (target modules, rank, alpha, etc.) in a real model and then re-save it.


What vLLM-Omni expects

vLLM-Omni’s diffusion LoRA endpoint requires a PEFT adapter folder like: lora_adapter/adapter_config.json + lora_adapter/adapter_model.safetensors. (vLLM)

The file you linked (Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors) is a single-file Diffusers LoRA weight (meant to be loaded with pipe.load_lora_weights(...)), not a PEFT adapter folder. (Hugging Face)

So you need to load it into the base model once, then re-save it via Diffusers’ PEFT adapter API (save_lora_adapter), which generates the adapter_config.json and a safetensors weight file. (Hugging Face)


Conversion script (Diffusers → PEFT adapter folder)

Notes:

  • The Qwen-Image-Lightning model card explicitly recommends installing Diffusers from main. (Hugging Face)
  • This produces the exact folder structure vLLM-Omni documents. (vLLM)
import math
import torch
from diffusers import DiffusionPipeline, FlowMatchEulerDiscreteScheduler

# 1) Create the base pipeline (same pattern as the model card)
scheduler_config = {
    "base_image_seq_len": 256,
    "base_shift": math.log(3),
    "invert_sigmas": False,
    "max_image_seq_len": 8192,
    "max_shift": math.log(3),
    "num_train_timesteps": 1000,
    "shift": 1.0,
    "shift_terminal": None,
    "stochastic_sampling": False,
    "time_shift_type": "exponential",
    "use_beta_sigmas": False,
    "use_dynamic_shifting": True,
    "use_exponential_sigmas": False,
    "use_karras_sigmas": False,
}
scheduler = FlowMatchEulerDiscreteScheduler.from_config(scheduler_config)

pipe = DiffusionPipeline.from_pretrained(
    "Qwen/Qwen-Image",
    scheduler=scheduler,
    torch_dtype=torch.bfloat16,
).to("cuda")

# 2) Load the single safetensors LoRA file into the pipeline
pipe.load_lora_weights(
    "lightx2v/Qwen-Image-Lightning",
    weight_name="Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors",
    adapter_name="lightning_v2",  # give it a name so we can save it explicitly
)

# 3) Re-save as a PEFT adapter folder (adapter_config.json + adapter_model.safetensors)
#    save_lora_adapter() is a PEFT adapter API on the *underlying model component*.
#    For Qwen/Qwen-Image, LoRA is typically on the diffusion "transformer" component.
pipe.transformer.save_lora_adapter(
    "lora_adapter",
    adapter_name="lightning_v2",
    safe_serialization=True,
    weight_name="adapter_model.safetensors",
)

print("Wrote PEFT adapter to ./lora_adapter")

save_lora_adapter(...) is documented to serialize the adapter (and supports weight_name + safetensors). (Hugging Face)


Use the output with vLLM-Omni

Point vLLM-Omni at the created folder:

  • --lora-path /path/to/lora_adapter (must be readable by the server) (vLLM)

  • Folder must contain:

    • adapter_config.json
    • adapter_model.safetensors (vLLM)

Troubleshooting

1) AttributeError: '...Pipeline' object has no attribute 'transformer'

Some pipelines use unet instead of transformer. In that case, save from pipe.unet:

pipe.unet.save_lora_adapter("lora_adapter", adapter_name="lightning_v2",
                           safe_serialization=True, weight_name="adapter_model.safetensors")

2) The LoRA loads in Diffusers but fails in PEFT save

Prefer the PEFT “model-level” path: load the adapter onto the component, then save it. Diffusers documents load_lora_adapter(...) + save_lora_adapter(...) as the direct model-level workflow. (Hugging Face)

3) You’re tempted to hand-write adapter_config.json

Don’t, unless you know the exact target modules / ranks / alphas expected by the model. vLLM-Omni (and Transformers PEFT loaders) assume a valid adapter_config.json alongside the weights. (vLLM)

Edit:
doesn’t work practically…

Hi, I run your script, but only get adapter_model.safetensors, no adapter_config.json, I get it from follow code:

pipe.transformer.peft_config["lightning_v2"].save_pretrained("./lora_adapter")

then pass folder(./lora_adapter) to vLLM-Omni and raise error, it say “state_dict” keys is not match…

1 Like

Sorry… The implementation of the inference part of the Diffusion model itself seems to differ quite a bit between Diffusers, Comfy UI, and vLLM-Omni.:scream:

In this case, forcing the state_dict key names to match might make it work, but it’s unclear if it would function correctly. (Depends on the code of that version of vLLM-Omni)

Merging it first would definitely work, I think… but it wouldn’t be a conversion.


To use Qwen-Image-Lightning LoRA on vLLM-Omni

Option A (recommended): merge the LoRA into the base model, then serve it as a normal model

This avoids the entire “PEFT adapter keys don’t match” problem.

Why this works: vLLM-Omni’s diffusion LoRA path is strict about module name alignment (see Option B). If you “bake” the LoRA deltas into the base weights, vLLM-Omni just loads a single checkpoint and there is no adapter to validate.

Steps

  1. Load the base Qwen-Image model (same base that the Lightning LoRA was trained for).

  2. Load the Lightning LoRA safetensors into that pipeline (Diffusers or the Qwen-Image reference loader).

  3. Merge/fuse LoRA into the base weights (so the model weights become the adapted weights).

  4. Save the merged model directory.

  5. Serve the merged directory with vLLM-Omni:

    • vLLM-Omni serves a single diffusion model per server instance. (vLLM)
  6. Use 8 inference steps when requesting images (because this LoRA is “8steps”). vLLM-Omni exposes num_inference_steps in the request body. (vLLM)

Why I’d pick this first: vLLM-Omni diffusion LoRA support is PEFT-compatible, but it’s new and keyed to vLLM’s internal module naming/packing behavior. (GitHub)


Why your current “PEFT folder” fails in vLLM-Omni

You already discovered:

  • You can produce adapter_model.safetensors

  • You can produce adapter_config.json via:

    pipe.transformer.peft_config["lightning_v2"].save_pretrained("./lora_adapter")
    

…but vLLM-Omni rejects it with “state_dict keys not match”.

That error is expected if the adapter’s target module names (and therefore the saved weight keys) don’t align with what vLLM-Omni believes are “supported/expected LoRA modules” for that diffusion pipeline.

What vLLM-Omni is doing internally

vLLM-Omni’s DiffusionLoRAManager:

  • Computes supported module suffixes from the pipeline using get_supported_lora_modules()
  • Builds/uses a packed_modules_mapping so it can handle fused projections (e.g., packed QKV) and accept LoRAs trained on logical sub-projections
  • Expands an _expected_lora_modules set
  • Loads the adapter via LoRAModel.from_local_checkpoint(... expected_lora_modules=...)
  • Critically: it passes weights_mapper=None (so there is no automatic renaming of keys) (vLLM)

So if Diffusers/ComfyUI used names like to_q, to_k, to_v, to_out, etc., but vLLM-Omni’s Qwen-Image transformer uses different names (and often packed/fused linears), your adapter keys won’t validate.

This is also why “same repository / same model” can still differ: vLLM-Omni re-implements diffusion transformer components with vLLM-style layers and packed projections for performance/parallelism, so module naming/structure can differ from Diffusers.


Option B: make a real vLLM-Omni-compatible PEFT LoRA (harder, but possible)

vLLM-Omni expects a PEFT folder like: (vLLM)

lora_adapter/
├── adapter_config.json
└── adapter_model.safetensors

But the content must match vLLM-Omni’s expected module names.

B1) First, extract what vLLM-Omni expects (target module suffixes)

Your goal: get the set that DiffusionLoRAManager calls _expected_lora_modules. (vLLM)

Practical ways:

  • Enable debug logging and trigger adapter load; it logs the supported/expected modules. (vLLM)

  • Or write a small script that instantiates the same pipeline/module objects and prints:

    • get_supported_lora_modules(pipeline)
    • any packed_modules_mapping found on modules
    • expanded expected modules (same function the manager uses)

B2) Inspect your Lightning safetensors keys (what you currently have)

Run something like:

from safetensors.torch import load_file

sd = load_file("Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors")
keys = list(sd.keys())
print("num_keys:", len(keys))
print("sample:", keys[:50])

# quick “module suffix” feel
import re
mods = set()
for k in keys:
    # tweak this depending on actual key style you see
    m = re.search(r"\.(to_[qkv]|to_out|q_proj|k_proj|v_proj|proj|fc1|fc2)\b", k)
    if m:
        mods.add(m.group(1))
print("matched module-ish tokens:", sorted(mods))

This tells you whether the file is closer to:

  • Diffusers attention naming (to_q, to_k, to_v, to_out)
  • HF transformer naming (q_proj, k_proj, v_proj, o_proj)
  • Something ComfyUI-specific

B3) Build a mapping: Diffusers/ComfyUI module names → vLLM-Omni module names

Typical mismatch patterns (examples):

  • to_q, to_k, to_v vs packed qkv projections
  • to_out.0 vs proj / o_proj
  • MLP: fc1/fc2 vs gate_up_proj/down_proj-style

vLLM-Omni is explicitly designed to handle packed projections by:

  • discovering packed_modules_mapping on the model
  • treating QKVParallelLinear as 3-slice packed (["q","k","v"]) (vLLM)

So if (and only if) the Qwen-Image vLLM-Omni implementation exposes a compatible mapping, you may be able to rename your adapter keys to match the “slice names” it will accept.

B4) Rewrite the adapter weights and config

You may need to:

  • Rewrite state_dict key paths (the important part)
  • Ensure adapter_config.json includes target_modules that match what vLLM-Omni expects and what your rewritten keys implement (it logs target_modules when loading). (vLLM)

A template for renaming keys:

from safetensors.torch import load_file, save_file

src = load_file("adapter_model.safetensors")

RENAMES = [
    (".to_q.", ".q."),      # example only
    (".to_k.", ".k."),
    (".to_v.", ".v."),
    (".to_out.0.", ".proj."),
]

dst = {}
for k, v in src.items():
    nk = k
    for a, b in RENAMES:
        nk = nk.replace(a, b)
    dst[nk] = v

save_file(dst, "adapter_model_vllm.safetensors")
print("done. keys:", len(dst))

Then point adapter_config.json to target_modules matching the suffixes vLLM-Omni expects.

B5) Reality check: you may need to patch vLLM-Omni

Because diffusion LoRA loading currently uses weights_mapper=None, there is no built-in key translation hook. (vLLM)
If the required mapping is non-trivial (common), the clean solution is:

  • add a weights_mapper for diffusion adapters (or a model-specific mapper for Qwen-Image)
  • or ensure the model exposes packed_modules_mapping that matches popular training tool outputs

How other users effectively use “ComfyUI LoRA” with vLLM-Omni (practically)

Most people who succeed quickly do one of:

  1. Merge LoRA into base weights and serve the merged model (Option A)
  2. Use LoRAs that were trained/exported in PEFT format against a module naming scheme that vLLM/vLLM-Omni accepts (often not ComfyUI-native single-file LoRAs)

Given your current error and vLLM-Omni’s strict loader, Option A is the most reliable path.


Reading list (relevant, practical)

  • vLLM-Omni diffusion LoRA online serving example and required folder format (vLLM)
  • vLLM-Omni DiffusionLoRAManager internals (why key mismatches happen; packed modules mapping; no weights_mapper) (vLLM)
  • vLLM-Omni release notes highlighting “Diffusion LoRA Adapter Support (PEFT-compatible)” (feature maturity context) (GitHub)
  • vLLM LoRA adapters documentation (general vLLM LoRA expectations and serving patterns) (vLLM)

To merge/fuse Lightning into the base model — step-by-step

0) What you will produce

A new local model directory that contains the base Qwen-Image weights with Lightning already applied, so vLLM-Omni loads it as a normal diffusion model (no LoRA at runtime). vLLM-Omni serves diffusion models via /v1/images/generations. (docs.vllm.ai)


1) Prepare environment (Diffusers “main”)

The Lightning model card explicitly says to install Diffusers from main. (Hugging Face)

pip install -U "torch" "transformers" "accelerate" "safetensors"
pip install -U "git+https://github.com/huggingface/diffusers.git"

2) Fuse the V2.0 bf16 LoRA into Qwen/Qwen-Image

Create a script fuse_qwen_image_lightning_v2.py:

import math
import torch
from diffusers import DiffusionPipeline, FlowMatchEulerDiscreteScheduler

# Scheduler config used by Qwen-Image-Lightning authors (shift=3 distillation)
scheduler_config = {
    "base_image_seq_len": 256,
    "base_shift": math.log(3),
    "invert_sigmas": False,
    "max_image_seq_len": 8192,
    "max_shift": math.log(3),
    "num_train_timesteps": 1000,
    "shift": 1.0,
    "shift_terminal": None,
    "stochastic_sampling": False,
    "time_shift_type": "exponential",
    "use_beta_sigmas": False,
    "use_dynamic_shifting": True,
    "use_exponential_sigmas": False,
    "use_karras_sigmas": False,
}

def main():
    device = "cuda"
    dtype = torch.bfloat16

    scheduler = FlowMatchEulerDiscreteScheduler.from_config(scheduler_config)

    # 1) Load base model
    pipe = DiffusionPipeline.from_pretrained(
        "Qwen/Qwen-Image",
        scheduler=scheduler,
        torch_dtype=dtype,
    ).to(device)

    # 2) Load Lightning LoRA (your file)
    pipe.load_lora_weights(
        "lightx2v/Qwen-Image-Lightning",
        weight_name="Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors",
    )

    # 3) Fuse LoRA into base weights, then unload adapter tensors
    #    (Diffusers recommends unload after fuse, then save_pretrained)
    pipe.fuse_lora(lora_scale=1.0)
    pipe.unload_lora_weights()

    # 4) Save the fused pipeline locally
    out_dir = "./Qwen-Image-Lightning-8steps-V2.0-fused"
    pipe.save_pretrained(out_dir, safe_serialization=True)

    print(f"Saved fused model to: {out_dir}")

if __name__ == "__main__":
    main()

Why these exact pieces:

  • The scheduler config and the “8 steps / true_cfg_scale=1.0” recipe are from the Lightning model card (they use a FlowMatchEulerDiscreteScheduler config with shift=3 via logs, and call the pipeline with 8 steps). (Hugging Face)
  • The fuse workflow is Diffusers’ documented pattern: fuse_lora() → unload_lora_weights() → save_pretrained(). (Hugging Face)

Run it:

python fuse_qwen_image_lightning_v2.py

3) Sanity-check the fused directory (optional but recommended)

After fusion, the model should work without load_lora_weights():

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "./Qwen-Image-Lightning-8steps-V2.0-fused",
    torch_dtype=torch.bfloat16,
).to("cuda")

img = pipe(
    prompt="a tiny astronaut hatching from an egg on the moon, Ultra HD, 4K",
    negative_prompt=" ",
    width=1024,
    height=1024,
    num_inference_steps=8,
    true_cfg_scale=1.0,
    generator=torch.manual_seed(0),
).images[0]

img.save("fused_test.png")

The “8 steps” + true_cfg_scale=1.0 matches the Lightning authors’ recommended inference settings. (Hugging Face)


4) Serve the fused model with vLLM-Omni

vLLM-Omni serves diffusion models with:

vllm serve /ABS/PATH/Qwen-Image-Lightning-8steps-V2.0-fused --omni --port 8000
  • vLLM-Omni uses /v1/images/generations for diffusion models. (docs.vllm.ai)
  • vLLM supports serving a local model path. (vLLM Forums)

If you get OOM during serving, the Qwen text-to-image example notes you can enable VAE slicing/tiling flags to reduce memory. (docs.vllm.ai)


5) Call the API using Lightning-like parameters

vLLM-Omni’s Image Generation API supports num_inference_steps, negative_prompt, and true_cfg_scale. (docs.vllm.ai)

curl -X POST http://localhost:8000/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a tiny astronaut hatching from an egg on the moon, Ultra HD, 4K",
    "negative_prompt": " ",
    "size": "1024x1024",
    "num_inference_steps": 8,
    "true_cfg_scale": 1.0,
    "seed": 0
  }' | jq -r ".data[0].b64_json" | base64 -d > out.png

Thank you again for your answer. I tried your method but it still didn’t work. vLLM-Omni raise error “transformer_blocks.0.attn.add_k_proj.alpha is unsupported LoRA weight” , I think we can only hope for support in the new version…:sad_but_relieved_face:

1 Like

Yeah. Or maybe it’d be faster to save the merged LoRA weights, upload them to Hugging Face, and use those…:thinking:
If we just use the entire model repository instead of LoRA, the differences in LoRA implementation won’t matter.

To make LoRAs for Diffusers/Comfy UI usable with vLLM-Omni, they’d need to make quite a few implementation changes on the vLLM-Omni side… Still, there seems to be demand (since there are many existing LoRAs), so the possibility of implementation might not be zero…