How to convert a single safetensors file to PEFT format

MMing · February 5, 2026, 7:51am

I’m struggle in getting adapter_model.safetensors and adapter_config.json from this weighted LoRA file

John6666 · February 5, 2026, 8:05am

No conversion needed. Use it directly with Diffusers like this:

https://huggingface.co/lightx2v/Qwen-Image-Lightning :

pipe = DiffusionPipeline.from_pretrained(
    "Qwen/Qwen-Image", scheduler=scheduler, torch_dtype=torch.bfloat16
).to("cuda")
pipe.load_lora_weights(
    "lightx2v/Qwen-Image-Lightning", weight_name="Qwen-Image-Lightning-8steps-V1.0.safetensors"
)

MMing · February 5, 2026, 8:09am

thanks reply, but vLLM-Omni request use PEFT format LoRA:

John6666 · February 5, 2026, 8:10am

Oh…

Key point: that `.safetensors` is a Diffusers/ComfyUI LoRA, not a Transformers “PEFT adapter folder”

The file you linked (Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors) is published to be loaded directly via Diffusers (pipe.load_lora_weights(...)) on top of the base model Qwen/Qwen-Image, or used in ComfyUI. The repo’s model card shows exactly that usage pattern. (Hugging Face)

By contrast, a Transformers/PEFT adapter typically lives in a directory containing adapter_config.json + adapter_model.safetensors. (Hugging Face)
Those files are not “extractable” from an arbitrary LoRA .safetensors unless you (re)construct the adapter configuration (target modules, rank, alpha, etc.) in a real model and then re-save it.

What vLLM-Omni expects

vLLM-Omni’s diffusion LoRA endpoint requires a PEFT adapter folder like: lora_adapter/adapter_config.json + lora_adapter/adapter_model.safetensors. (vLLM)

The file you linked (Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors) is a single-file Diffusers LoRA weight (meant to be loaded with pipe.load_lora_weights(...)), not a PEFT adapter folder. (Hugging Face)

So you need to load it into the base model once, then re-save it via Diffusers’ PEFT adapter API (save_lora_adapter), which generates the adapter_config.json and a safetensors weight file. (Hugging Face)

Conversion script (Diffusers → PEFT adapter folder)

Notes:

The Qwen-Image-Lightning model card explicitly recommends installing Diffusers from main. (Hugging Face)

This produces the exact folder structure vLLM-Omni documents. (vLLM)

import math
import torch
from diffusers import DiffusionPipeline, FlowMatchEulerDiscreteScheduler

# 1) Create the base pipeline (same pattern as the model card)
scheduler_config = {
    "base_image_seq_len": 256,
    "base_shift": math.log(3),
    "invert_sigmas": False,
    "max_image_seq_len": 8192,
    "max_shift": math.log(3),
    "num_train_timesteps": 1000,
    "shift": 1.0,
    "shift_terminal": None,
    "stochastic_sampling": False,
    "time_shift_type": "exponential",
    "use_beta_sigmas": False,
    "use_dynamic_shifting": True,
    "use_exponential_sigmas": False,
    "use_karras_sigmas": False,
}
scheduler = FlowMatchEulerDiscreteScheduler.from_config(scheduler_config)

pipe = DiffusionPipeline.from_pretrained(
    "Qwen/Qwen-Image",
    scheduler=scheduler,
    torch_dtype=torch.bfloat16,
).to("cuda")

# 2) Load the single safetensors LoRA file into the pipeline
pipe.load_lora_weights(
    "lightx2v/Qwen-Image-Lightning",
    weight_name="Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors",
    adapter_name="lightning_v2",  # give it a name so we can save it explicitly
)

# 3) Re-save as a PEFT adapter folder (adapter_config.json + adapter_model.safetensors)
#    save_lora_adapter() is a PEFT adapter API on the *underlying model component*.
#    For Qwen/Qwen-Image, LoRA is typically on the diffusion "transformer" component.
pipe.transformer.save_lora_adapter(
    "lora_adapter",
    adapter_name="lightning_v2",
    safe_serialization=True,
    weight_name="adapter_model.safetensors",
)

print("Wrote PEFT adapter to ./lora_adapter")

save_lora_adapter(...) is documented to serialize the adapter (and supports weight_name + safetensors). (Hugging Face)

Use the output with vLLM-Omni

Point vLLM-Omni at the created folder:

--lora-path /path/to/lora_adapter (must be readable by the server) (vLLM)
Folder must contain:
- adapter_config.json
- adapter_model.safetensors (vLLM)

Troubleshooting

1) `AttributeError: '...Pipeline' object has no attribute 'transformer'`

Some pipelines use unet instead of transformer. In that case, save from pipe.unet:

pipe.unet.save_lora_adapter("lora_adapter", adapter_name="lightning_v2",
                           safe_serialization=True, weight_name="adapter_model.safetensors")

2) The LoRA loads in Diffusers but fails in PEFT save

Prefer the PEFT “model-level” path: load the adapter onto the component, then save it. Diffusers documents load_lora_adapter(...) + save_lora_adapter(...) as the direct model-level workflow. (Hugging Face)

3) You’re tempted to hand-write `adapter_config.json`

Don’t, unless you know the exact target modules / ranks / alphas expected by the model. vLLM-Omni (and Transformers PEFT loaders) assume a valid adapter_config.json alongside the weights. (vLLM)

Edit:
doesn’t work practically…

MMing · February 6, 2026, 7:50am

Hi, I run your script, but only get adapter_model.safetensors, no adapter_config.json, I get it from follow code:

pipe.transformer.peft_config["lightning_v2"].save_pretrained("./lora_adapter")

then pass folder(./lora_adapter) to vLLM-Omni and raise error, it say “state_dict” keys is not match…

John6666 · February 6, 2026, 1:07pm

Sorry… The implementation of the inference part of the Diffusion model itself seems to differ quite a bit between Diffusers, Comfy UI, and vLLM-Omni.

In this case, forcing the state_dict key names to match might make it work, but it’s unclear if it would function correctly. (Depends on the code of that version of vLLM-Omni)

Merging it first would definitely work, I think… but it wouldn’t be a conversion.

To use Qwen-Image-Lightning LoRA on vLLM-Omni

Option A (recommended): merge the LoRA into the base model, then serve it as a normal model

This avoids the entire “PEFT adapter keys don’t match” problem.

Why this works: vLLM-Omni’s diffusion LoRA path is strict about module name alignment (see Option B). If you “bake” the LoRA deltas into the base weights, vLLM-Omni just loads a single checkpoint and there is no adapter to validate.

Steps

Load the base Qwen-Image model (same base that the Lightning LoRA was trained for).
Load the Lightning LoRA safetensors into that pipeline (Diffusers or the Qwen-Image reference loader).
Merge/fuse LoRA into the base weights (so the model weights become the adapted weights).
Save the merged model directory.
Serve the merged directory with vLLM-Omni:
- vLLM-Omni serves a single diffusion model per server instance. (vLLM)
Use 8 inference steps when requesting images (because this LoRA is “8steps”). vLLM-Omni exposes num_inference_steps in the request body. (vLLM)

Why I’d pick this first: vLLM-Omni diffusion LoRA support is PEFT-compatible, but it’s new and keyed to vLLM’s internal module naming/packing behavior. (GitHub)

Why your current “PEFT folder” fails in vLLM-Omni

You already discovered:

You can produce adapter_model.safetensors

You can produce adapter_config.json via:

pipe.transformer.peft_config["lightning_v2"].save_pretrained("./lora_adapter")

…but vLLM-Omni rejects it with “state_dict keys not match”.

That error is expected if the adapter’s target module names (and therefore the saved weight keys) don’t align with what vLLM-Omni believes are “supported/expected LoRA modules” for that diffusion pipeline.

What vLLM-Omni is doing internally

vLLM-Omni’s DiffusionLoRAManager:

Computes supported module suffixes from the pipeline using get_supported_lora_modules()
Builds/uses a packed_modules_mapping so it can handle fused projections (e.g., packed QKV) and accept LoRAs trained on logical sub-projections
Expands an _expected_lora_modules set
Loads the adapter via LoRAModel.from_local_checkpoint(... expected_lora_modules=...)
Critically: it passes weights_mapper=None (so there is no automatic renaming of keys) (vLLM)

So if Diffusers/ComfyUI used names like to_q, to_k, to_v, to_out, etc., but vLLM-Omni’s Qwen-Image transformer uses different names (and often packed/fused linears), your adapter keys won’t validate.

This is also why “same repository / same model” can still differ: vLLM-Omni re-implements diffusion transformer components with vLLM-style layers and packed projections for performance/parallelism, so module naming/structure can differ from Diffusers.

Option B: make a real vLLM-Omni-compatible PEFT LoRA (harder, but possible)

vLLM-Omni expects a PEFT folder like: (vLLM)

lora_adapter/
├── adapter_config.json
└── adapter_model.safetensors

But the content must match vLLM-Omni’s expected module names.

B1) First, extract what vLLM-Omni expects (target module suffixes)

Your goal: get the set that DiffusionLoRAManager calls _expected_lora_modules. (vLLM)

Practical ways:

Enable debug logging and trigger adapter load; it logs the supported/expected modules. (vLLM)
Or write a small script that instantiates the same pipeline/module objects and prints:
- get_supported_lora_modules(pipeline)
- any packed_modules_mapping found on modules
- expanded expected modules (same function the manager uses)

B2) Inspect your Lightning safetensors keys (what you currently have)

Run something like:

from safetensors.torch import load_file

sd = load_file("Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors")
keys = list(sd.keys())
print("num_keys:", len(keys))
print("sample:", keys[:50])

# quick “module suffix” feel
import re
mods = set()
for k in keys:
    # tweak this depending on actual key style you see
    m = re.search(r"\.(to_[qkv]|to_out|q_proj|k_proj|v_proj|proj|fc1|fc2)\b", k)
    if m:
        mods.add(m.group(1))
print("matched module-ish tokens:", sorted(mods))

This tells you whether the file is closer to:

Diffusers attention naming (to_q, to_k, to_v, to_out)
HF transformer naming (q_proj, k_proj, v_proj, o_proj)
Something ComfyUI-specific

B3) Build a mapping: Diffusers/ComfyUI module names → vLLM-Omni module names

Typical mismatch patterns (examples):

to_q, to_k, to_v vs packed qkv projections
to_out.0 vs proj / o_proj
MLP: fc1/fc2 vs gate_up_proj/down_proj-style

vLLM-Omni is explicitly designed to handle packed projections by:

discovering packed_modules_mapping on the model
treating QKVParallelLinear as 3-slice packed (["q","k","v"]) (vLLM)

So if (and only if) the Qwen-Image vLLM-Omni implementation exposes a compatible mapping, you may be able to rename your adapter keys to match the “slice names” it will accept.

B4) Rewrite the adapter weights and config

You may need to:

Rewrite state_dict key paths (the important part)
Ensure adapter_config.json includes target_modules that match what vLLM-Omni expects and what your rewritten keys implement (it logs target_modules when loading). (vLLM)

A template for renaming keys:

from safetensors.torch import load_file, save_file

src = load_file("adapter_model.safetensors")

RENAMES = [
    (".to_q.", ".q."),      # example only
    (".to_k.", ".k."),
    (".to_v.", ".v."),
    (".to_out.0.", ".proj."),
]

dst = {}
for k, v in src.items():
    nk = k
    for a, b in RENAMES:
        nk = nk.replace(a, b)
    dst[nk] = v

save_file(dst, "adapter_model_vllm.safetensors")
print("done. keys:", len(dst))

Then point adapter_config.json to target_modules matching the suffixes vLLM-Omni expects.

B5) Reality check: you may need to patch vLLM-Omni

Because diffusion LoRA loading currently uses weights_mapper=None, there is no built-in key translation hook. (vLLM)
If the required mapping is non-trivial (common), the clean solution is:

add a weights_mapper for diffusion adapters (or a model-specific mapper for Qwen-Image)
or ensure the model exposes packed_modules_mapping that matches popular training tool outputs

How other users effectively use “ComfyUI LoRA” with vLLM-Omni (practically)

Most people who succeed quickly do one of:

Merge LoRA into base weights and serve the merged model (Option A)
Use LoRAs that were trained/exported in PEFT format against a module naming scheme that vLLM/vLLM-Omni accepts (often not ComfyUI-native single-file LoRAs)

Given your current error and vLLM-Omni’s strict loader, Option A is the most reliable path.

Reading list (relevant, practical)

vLLM-Omni diffusion LoRA online serving example and required folder format (vLLM)
vLLM-Omni DiffusionLoRAManager internals (why key mismatches happen; packed modules mapping; no weights_mapper) (vLLM)
vLLM-Omni release notes highlighting “Diffusion LoRA Adapter Support (PEFT-compatible)” (feature maturity context) (GitHub)
vLLM LoRA adapters documentation (general vLLM LoRA expectations and serving patterns) (vLLM)

To merge/fuse Lightning into the base model — step-by-step

0) What you will produce

A new local model directory that contains the base Qwen-Image weights with Lightning already applied, so vLLM-Omni loads it as a normal diffusion model (no LoRA at runtime). vLLM-Omni serves diffusion models via /v1/images/generations. (docs.vllm.ai)

1) Prepare environment (Diffusers “main”)

The Lightning model card explicitly says to install Diffusers from main. (Hugging Face)

pip install -U "torch" "transformers" "accelerate" "safetensors"
pip install -U "git+https://github.com/huggingface/diffusers.git"

2) Fuse the V2.0 bf16 LoRA into Qwen/Qwen-Image

Create a script fuse_qwen_image_lightning_v2.py:

import math
import torch
from diffusers import DiffusionPipeline, FlowMatchEulerDiscreteScheduler

# Scheduler config used by Qwen-Image-Lightning authors (shift=3 distillation)
scheduler_config = {
    "base_image_seq_len": 256,
    "base_shift": math.log(3),
    "invert_sigmas": False,
    "max_image_seq_len": 8192,
    "max_shift": math.log(3),
    "num_train_timesteps": 1000,
    "shift": 1.0,
    "shift_terminal": None,
    "stochastic_sampling": False,
    "time_shift_type": "exponential",
    "use_beta_sigmas": False,
    "use_dynamic_shifting": True,
    "use_exponential_sigmas": False,
    "use_karras_sigmas": False,
}

def main():
    device = "cuda"
    dtype = torch.bfloat16

    scheduler = FlowMatchEulerDiscreteScheduler.from_config(scheduler_config)

    # 1) Load base model
    pipe = DiffusionPipeline.from_pretrained(
        "Qwen/Qwen-Image",
        scheduler=scheduler,
        torch_dtype=dtype,
    ).to(device)

    # 2) Load Lightning LoRA (your file)
    pipe.load_lora_weights(
        "lightx2v/Qwen-Image-Lightning",
        weight_name="Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors",
    )

    # 3) Fuse LoRA into base weights, then unload adapter tensors
    #    (Diffusers recommends unload after fuse, then save_pretrained)
    pipe.fuse_lora(lora_scale=1.0)
    pipe.unload_lora_weights()

    # 4) Save the fused pipeline locally
    out_dir = "./Qwen-Image-Lightning-8steps-V2.0-fused"
    pipe.save_pretrained(out_dir, safe_serialization=True)

    print(f"Saved fused model to: {out_dir}")

if __name__ == "__main__":
    main()

Why these exact pieces:

The scheduler config and the “8 steps / true_cfg_scale=1.0” recipe are from the Lightning model card (they use a FlowMatchEulerDiscreteScheduler config with shift=3 via logs, and call the pipeline with 8 steps). (Hugging Face)
The fuse workflow is Diffusers’ documented pattern: fuse_lora() → unload_lora_weights() → save_pretrained(). (Hugging Face)

Run it:

python fuse_qwen_image_lightning_v2.py

3) Sanity-check the fused directory (optional but recommended)

After fusion, the model should work without load_lora_weights():

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "./Qwen-Image-Lightning-8steps-V2.0-fused",
    torch_dtype=torch.bfloat16,
).to("cuda")

img = pipe(
    prompt="a tiny astronaut hatching from an egg on the moon, Ultra HD, 4K",
    negative_prompt=" ",
    width=1024,
    height=1024,
    num_inference_steps=8,
    true_cfg_scale=1.0,
    generator=torch.manual_seed(0),
).images[0]

img.save("fused_test.png")

The “8 steps” + true_cfg_scale=1.0 matches the Lightning authors’ recommended inference settings. (Hugging Face)

4) Serve the fused model with vLLM-Omni

vLLM-Omni serves diffusion models with:

vllm serve /ABS/PATH/Qwen-Image-Lightning-8steps-V2.0-fused --omni --port 8000

vLLM-Omni uses /v1/images/generations for diffusion models. (docs.vllm.ai)
vLLM supports serving a local model path. (vLLM Forums)

If you get OOM during serving, the Qwen text-to-image example notes you can enable VAE slicing/tiling flags to reduce memory. (docs.vllm.ai)

5) Call the API using Lightning-like parameters

vLLM-Omni’s Image Generation API supports num_inference_steps, negative_prompt, and true_cfg_scale. (docs.vllm.ai)

curl -X POST http://localhost:8000/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a tiny astronaut hatching from an egg on the moon, Ultra HD, 4K",
    "negative_prompt": " ",
    "size": "1024x1024",
    "num_inference_steps": 8,
    "true_cfg_scale": 1.0,
    "seed": 0
  }' | jq -r ".data[0].b64_json" | base64 -d > out.png

MMing · February 24, 2026, 8:39am

Thank you again for your answer. I tried your method but it still didn’t work. vLLM-Omni raise error “transformer_blocks.0.attn.add_k_proj.alpha is unsupported LoRA weight” , I think we can only hope for support in the new version…

John6666 · February 24, 2026, 8:48am

Yeah. Or maybe it’d be faster to save the merged LoRA weights, upload them to Hugging Face, and use those…
If we just use the entire model repository instead of LoRA, the differences in LoRA implementation won’t matter.

To make LoRAs for Diffusers/Comfy UI usable with vLLM-Omni, they’d need to make quite a few implementation changes on the vLLM-Omni side… Still, there seems to be demand (since there are many existing LoRAs), so the possibility of implementation might not be zero…

Topic		Replies	Views
Handling Peft Model the right way (save, load, inference) 🤗Transformers	0	180	August 10, 2024
I wonder how to merge my PEFT adapter with the base model and finally get a new whole model? 🤗Transformers	27	1642	February 7, 2025
Loading LORA weights does not change anything Beginners	3	187	December 29, 2025
How could I convert a LoRA .safetensors or .ckpt file into the format that diffusers can process? 🧨 Diffusers	1	8711	February 16, 2023
Confusing diffusers documentation on usage of kohya _ss Lora + SDXL 🧨 Diffusers	0	1232	November 15, 2023