I’m struggle in getting adapter_model.safetensors and adapter_config.json from this weighted LoRA file
No conversion needed. Use it directly with Diffusers like this:
https://huggingface.co/lightx2v/Qwen-Image-Lightning :
pipe = DiffusionPipeline.from_pretrained(
"Qwen/Qwen-Image", scheduler=scheduler, torch_dtype=torch.bfloat16
).to("cuda")
pipe.load_lora_weights(
"lightx2v/Qwen-Image-Lightning", weight_name="Qwen-Image-Lightning-8steps-V1.0.safetensors"
)
thanks reply, but vLLM-Omni request use PEFT format LoRA:
Oh…
Key point: that .safetensors is a Diffusers/ComfyUI LoRA, not a Transformers “PEFT adapter folder”
The file you linked (Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors) is published to be loaded directly via Diffusers (pipe.load_lora_weights(...)) on top of the base model Qwen/Qwen-Image, or used in ComfyUI. The repo’s model card shows exactly that usage pattern. (Hugging Face)
By contrast, a Transformers/PEFT adapter typically lives in a directory containing adapter_config.json + adapter_model.safetensors. (Hugging Face)
Those files are not “extractable” from an arbitrary LoRA .safetensors unless you (re)construct the adapter configuration (target modules, rank, alpha, etc.) in a real model and then re-save it.
What vLLM-Omni expects
vLLM-Omni’s diffusion LoRA endpoint requires a PEFT adapter folder like: lora_adapter/adapter_config.json + lora_adapter/adapter_model.safetensors. (vLLM)
The file you linked (Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors) is a single-file Diffusers LoRA weight (meant to be loaded with pipe.load_lora_weights(...)), not a PEFT adapter folder. (Hugging Face)
So you need to load it into the base model once, then re-save it via Diffusers’ PEFT adapter API (save_lora_adapter), which generates the adapter_config.json and a safetensors weight file. (Hugging Face)
Conversion script (Diffusers → PEFT adapter folder)
Notes:
- The Qwen-Image-Lightning model card explicitly recommends installing Diffusers from
main. (Hugging Face)- This produces the exact folder structure vLLM-Omni documents. (vLLM)
import math
import torch
from diffusers import DiffusionPipeline, FlowMatchEulerDiscreteScheduler
# 1) Create the base pipeline (same pattern as the model card)
scheduler_config = {
"base_image_seq_len": 256,
"base_shift": math.log(3),
"invert_sigmas": False,
"max_image_seq_len": 8192,
"max_shift": math.log(3),
"num_train_timesteps": 1000,
"shift": 1.0,
"shift_terminal": None,
"stochastic_sampling": False,
"time_shift_type": "exponential",
"use_beta_sigmas": False,
"use_dynamic_shifting": True,
"use_exponential_sigmas": False,
"use_karras_sigmas": False,
}
scheduler = FlowMatchEulerDiscreteScheduler.from_config(scheduler_config)
pipe = DiffusionPipeline.from_pretrained(
"Qwen/Qwen-Image",
scheduler=scheduler,
torch_dtype=torch.bfloat16,
).to("cuda")
# 2) Load the single safetensors LoRA file into the pipeline
pipe.load_lora_weights(
"lightx2v/Qwen-Image-Lightning",
weight_name="Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors",
adapter_name="lightning_v2", # give it a name so we can save it explicitly
)
# 3) Re-save as a PEFT adapter folder (adapter_config.json + adapter_model.safetensors)
# save_lora_adapter() is a PEFT adapter API on the *underlying model component*.
# For Qwen/Qwen-Image, LoRA is typically on the diffusion "transformer" component.
pipe.transformer.save_lora_adapter(
"lora_adapter",
adapter_name="lightning_v2",
safe_serialization=True,
weight_name="adapter_model.safetensors",
)
print("Wrote PEFT adapter to ./lora_adapter")
save_lora_adapter(...) is documented to serialize the adapter (and supports weight_name + safetensors). (Hugging Face)
Use the output with vLLM-Omni
Point vLLM-Omni at the created folder:
-
--lora-path /path/to/lora_adapter(must be readable by the server) (vLLM) -
Folder must contain:
adapter_config.jsonadapter_model.safetensors(vLLM)
Troubleshooting
1) AttributeError: '...Pipeline' object has no attribute 'transformer'
Some pipelines use unet instead of transformer. In that case, save from pipe.unet:
pipe.unet.save_lora_adapter("lora_adapter", adapter_name="lightning_v2",
safe_serialization=True, weight_name="adapter_model.safetensors")
2) The LoRA loads in Diffusers but fails in PEFT save
Prefer the PEFT “model-level” path: load the adapter onto the component, then save it. Diffusers documents load_lora_adapter(...) + save_lora_adapter(...) as the direct model-level workflow. (Hugging Face)
3) You’re tempted to hand-write adapter_config.json
Don’t, unless you know the exact target modules / ranks / alphas expected by the model. vLLM-Omni (and Transformers PEFT loaders) assume a valid adapter_config.json alongside the weights. (vLLM)
Edit:
doesn’t work practically…
Hi, I run your script, but only get adapter_model.safetensors, no adapter_config.json, I get it from follow code:
pipe.transformer.peft_config["lightning_v2"].save_pretrained("./lora_adapter")
then pass folder(./lora_adapter) to vLLM-Omni and raise error, it say “state_dict” keys is not match…
Sorry… The implementation of the inference part of the Diffusion model itself seems to differ quite a bit between Diffusers, Comfy UI, and vLLM-Omni.![]()
In this case, forcing the state_dict key names to match might make it work, but it’s unclear if it would function correctly. (Depends on the code of that version of vLLM-Omni)
Merging it first would definitely work, I think… but it wouldn’t be a conversion.
To use Qwen-Image-Lightning LoRA on vLLM-Omni
Option A (recommended): merge the LoRA into the base model, then serve it as a normal model
This avoids the entire “PEFT adapter keys don’t match” problem.
Why this works: vLLM-Omni’s diffusion LoRA path is strict about module name alignment (see Option B). If you “bake” the LoRA deltas into the base weights, vLLM-Omni just loads a single checkpoint and there is no adapter to validate.
Steps
-
Load the base Qwen-Image model (same base that the Lightning LoRA was trained for).
-
Load the Lightning LoRA safetensors into that pipeline (Diffusers or the Qwen-Image reference loader).
-
Merge/fuse LoRA into the base weights (so the model weights become the adapted weights).
-
Save the merged model directory.
-
Serve the merged directory with vLLM-Omni:
- vLLM-Omni serves a single diffusion model per server instance. (vLLM)
-
Use 8 inference steps when requesting images (because this LoRA is “8steps”). vLLM-Omni exposes
num_inference_stepsin the request body. (vLLM)
Why I’d pick this first: vLLM-Omni diffusion LoRA support is PEFT-compatible, but it’s new and keyed to vLLM’s internal module naming/packing behavior. (GitHub)
Why your current “PEFT folder” fails in vLLM-Omni
You already discovered:
-
You can produce
adapter_model.safetensors -
You can produce
adapter_config.jsonvia:pipe.transformer.peft_config["lightning_v2"].save_pretrained("./lora_adapter")
…but vLLM-Omni rejects it with “state_dict keys not match”.
That error is expected if the adapter’s target module names (and therefore the saved weight keys) don’t align with what vLLM-Omni believes are “supported/expected LoRA modules” for that diffusion pipeline.
What vLLM-Omni is doing internally
vLLM-Omni’s DiffusionLoRAManager:
- Computes supported module suffixes from the pipeline using
get_supported_lora_modules() - Builds/uses a
packed_modules_mappingso it can handle fused projections (e.g., packed QKV) and accept LoRAs trained on logical sub-projections - Expands an
_expected_lora_modulesset - Loads the adapter via
LoRAModel.from_local_checkpoint(... expected_lora_modules=...) - Critically: it passes
weights_mapper=None(so there is no automatic renaming of keys) (vLLM)
So if Diffusers/ComfyUI used names like to_q, to_k, to_v, to_out, etc., but vLLM-Omni’s Qwen-Image transformer uses different names (and often packed/fused linears), your adapter keys won’t validate.
This is also why “same repository / same model” can still differ: vLLM-Omni re-implements diffusion transformer components with vLLM-style layers and packed projections for performance/parallelism, so module naming/structure can differ from Diffusers.
Option B: make a real vLLM-Omni-compatible PEFT LoRA (harder, but possible)
vLLM-Omni expects a PEFT folder like: (vLLM)
lora_adapter/
├── adapter_config.json
└── adapter_model.safetensors
But the content must match vLLM-Omni’s expected module names.
B1) First, extract what vLLM-Omni expects (target module suffixes)
Your goal: get the set that DiffusionLoRAManager calls _expected_lora_modules. (vLLM)
Practical ways:
-
Enable debug logging and trigger adapter load; it logs the supported/expected modules. (vLLM)
-
Or write a small script that instantiates the same pipeline/module objects and prints:
get_supported_lora_modules(pipeline)- any
packed_modules_mappingfound on modules - expanded expected modules (same function the manager uses)
B2) Inspect your Lightning safetensors keys (what you currently have)
Run something like:
from safetensors.torch import load_file
sd = load_file("Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors")
keys = list(sd.keys())
print("num_keys:", len(keys))
print("sample:", keys[:50])
# quick “module suffix” feel
import re
mods = set()
for k in keys:
# tweak this depending on actual key style you see
m = re.search(r"\.(to_[qkv]|to_out|q_proj|k_proj|v_proj|proj|fc1|fc2)\b", k)
if m:
mods.add(m.group(1))
print("matched module-ish tokens:", sorted(mods))
This tells you whether the file is closer to:
- Diffusers attention naming (
to_q,to_k,to_v,to_out) - HF transformer naming (
q_proj,k_proj,v_proj,o_proj) - Something ComfyUI-specific
B3) Build a mapping: Diffusers/ComfyUI module names → vLLM-Omni module names
Typical mismatch patterns (examples):
to_q,to_k,to_vvs packedqkvprojectionsto_out.0vsproj/o_proj- MLP:
fc1/fc2vsgate_up_proj/down_proj-style
vLLM-Omni is explicitly designed to handle packed projections by:
- discovering
packed_modules_mappingon the model - treating
QKVParallelLinearas 3-slice packed (["q","k","v"]) (vLLM)
So if (and only if) the Qwen-Image vLLM-Omni implementation exposes a compatible mapping, you may be able to rename your adapter keys to match the “slice names” it will accept.
B4) Rewrite the adapter weights and config
You may need to:
- Rewrite state_dict key paths (the important part)
- Ensure
adapter_config.jsonincludestarget_modulesthat match what vLLM-Omni expects and what your rewritten keys implement (it logstarget_moduleswhen loading). (vLLM)
A template for renaming keys:
from safetensors.torch import load_file, save_file
src = load_file("adapter_model.safetensors")
RENAMES = [
(".to_q.", ".q."), # example only
(".to_k.", ".k."),
(".to_v.", ".v."),
(".to_out.0.", ".proj."),
]
dst = {}
for k, v in src.items():
nk = k
for a, b in RENAMES:
nk = nk.replace(a, b)
dst[nk] = v
save_file(dst, "adapter_model_vllm.safetensors")
print("done. keys:", len(dst))
Then point adapter_config.json to target_modules matching the suffixes vLLM-Omni expects.
B5) Reality check: you may need to patch vLLM-Omni
Because diffusion LoRA loading currently uses weights_mapper=None, there is no built-in key translation hook. (vLLM)
If the required mapping is non-trivial (common), the clean solution is:
- add a
weights_mapperfor diffusion adapters (or a model-specific mapper for Qwen-Image) - or ensure the model exposes
packed_modules_mappingthat matches popular training tool outputs
How other users effectively use “ComfyUI LoRA” with vLLM-Omni (practically)
Most people who succeed quickly do one of:
- Merge LoRA into base weights and serve the merged model (Option A)
- Use LoRAs that were trained/exported in PEFT format against a module naming scheme that vLLM/vLLM-Omni accepts (often not ComfyUI-native single-file LoRAs)
Given your current error and vLLM-Omni’s strict loader, Option A is the most reliable path.
Reading list (relevant, practical)
- vLLM-Omni diffusion LoRA online serving example and required folder format (vLLM)
- vLLM-Omni
DiffusionLoRAManagerinternals (why key mismatches happen; packed modules mapping; no weights_mapper) (vLLM) - vLLM-Omni release notes highlighting “Diffusion LoRA Adapter Support (PEFT-compatible)” (feature maturity context) (GitHub)
- vLLM LoRA adapters documentation (general vLLM LoRA expectations and serving patterns) (vLLM)
To merge/fuse Lightning into the base model — step-by-step
0) What you will produce
A new local model directory that contains the base Qwen-Image weights with Lightning already applied, so vLLM-Omni loads it as a normal diffusion model (no LoRA at runtime). vLLM-Omni serves diffusion models via /v1/images/generations. (docs.vllm.ai)
1) Prepare environment (Diffusers “main”)
The Lightning model card explicitly says to install Diffusers from main. (Hugging Face)
pip install -U "torch" "transformers" "accelerate" "safetensors"
pip install -U "git+https://github.com/huggingface/diffusers.git"
2) Fuse the V2.0 bf16 LoRA into Qwen/Qwen-Image
Create a script fuse_qwen_image_lightning_v2.py:
import math
import torch
from diffusers import DiffusionPipeline, FlowMatchEulerDiscreteScheduler
# Scheduler config used by Qwen-Image-Lightning authors (shift=3 distillation)
scheduler_config = {
"base_image_seq_len": 256,
"base_shift": math.log(3),
"invert_sigmas": False,
"max_image_seq_len": 8192,
"max_shift": math.log(3),
"num_train_timesteps": 1000,
"shift": 1.0,
"shift_terminal": None,
"stochastic_sampling": False,
"time_shift_type": "exponential",
"use_beta_sigmas": False,
"use_dynamic_shifting": True,
"use_exponential_sigmas": False,
"use_karras_sigmas": False,
}
def main():
device = "cuda"
dtype = torch.bfloat16
scheduler = FlowMatchEulerDiscreteScheduler.from_config(scheduler_config)
# 1) Load base model
pipe = DiffusionPipeline.from_pretrained(
"Qwen/Qwen-Image",
scheduler=scheduler,
torch_dtype=dtype,
).to(device)
# 2) Load Lightning LoRA (your file)
pipe.load_lora_weights(
"lightx2v/Qwen-Image-Lightning",
weight_name="Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors",
)
# 3) Fuse LoRA into base weights, then unload adapter tensors
# (Diffusers recommends unload after fuse, then save_pretrained)
pipe.fuse_lora(lora_scale=1.0)
pipe.unload_lora_weights()
# 4) Save the fused pipeline locally
out_dir = "./Qwen-Image-Lightning-8steps-V2.0-fused"
pipe.save_pretrained(out_dir, safe_serialization=True)
print(f"Saved fused model to: {out_dir}")
if __name__ == "__main__":
main()
Why these exact pieces:
- The scheduler config and the “8 steps / true_cfg_scale=1.0” recipe are from the Lightning model card (they use a FlowMatchEulerDiscreteScheduler config with
shift=3via logs, and call the pipeline with 8 steps). (Hugging Face) - The fuse workflow is Diffusers’ documented pattern:
fuse_lora()→unload_lora_weights()→save_pretrained(). (Hugging Face)
Run it:
python fuse_qwen_image_lightning_v2.py
3) Sanity-check the fused directory (optional but recommended)
After fusion, the model should work without load_lora_weights():
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"./Qwen-Image-Lightning-8steps-V2.0-fused",
torch_dtype=torch.bfloat16,
).to("cuda")
img = pipe(
prompt="a tiny astronaut hatching from an egg on the moon, Ultra HD, 4K",
negative_prompt=" ",
width=1024,
height=1024,
num_inference_steps=8,
true_cfg_scale=1.0,
generator=torch.manual_seed(0),
).images[0]
img.save("fused_test.png")
The “8 steps” + true_cfg_scale=1.0 matches the Lightning authors’ recommended inference settings. (Hugging Face)
4) Serve the fused model with vLLM-Omni
vLLM-Omni serves diffusion models with:
vllm serve /ABS/PATH/Qwen-Image-Lightning-8steps-V2.0-fused --omni --port 8000
- vLLM-Omni uses
/v1/images/generationsfor diffusion models. (docs.vllm.ai) - vLLM supports serving a local model path. (vLLM Forums)
If you get OOM during serving, the Qwen text-to-image example notes you can enable VAE slicing/tiling flags to reduce memory. (docs.vllm.ai)
5) Call the API using Lightning-like parameters
vLLM-Omni’s Image Generation API supports num_inference_steps, negative_prompt, and true_cfg_scale. (docs.vllm.ai)
curl -X POST http://localhost:8000/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"prompt": "a tiny astronaut hatching from an egg on the moon, Ultra HD, 4K",
"negative_prompt": " ",
"size": "1024x1024",
"num_inference_steps": 8,
"true_cfg_scale": 1.0,
"seed": 0
}' | jq -r ".data[0].b64_json" | base64 -d > out.png
Thank you again for your answer. I tried your method but it still didn’t work. vLLM-Omni raise error “transformer_blocks.0.attn.add_k_proj.alpha is unsupported LoRA weight” , I think we can only hope for support in the new version…![]()
Yeah. Or maybe it’d be faster to save the merged LoRA weights, upload them to Hugging Face, and use those…![]()
If we just use the entire model repository instead of LoRA, the differences in LoRA implementation won’t matter.
To make LoRAs for Diffusers/Comfy UI usable with vLLM-Omni, they’d need to make quite a few implementation changes on the vLLM-Omni side… Still, there seems to be demand (since there are many existing LoRAs), so the possibility of implementation might not be zero…