YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Copyright (C) [2026] Advanced Micro Devices, Inc. All rights reserved. Portions of this file consist of AI generated content.

Qwen3-VL ONNX (ONNX Runtime GenAI)

Convert Qwen3-VL checkpoints to ONNX Runtime GenAI format with dynamic image-size support, then run local multimodal inference.

This README is written in a model-card style so it can be moved into a Hugging Face repo with minimal changes.

Overview

  • Exports three ONNX components:
    • qwen3vl-vision.onnx (vision encoder, FP32)
    • qwen3vl-embedding.onnx (image-token embedding injector, FP32)
    • model.onnx (text decoder, fp32/fp16/int4)
  • Wires genai_config.json for ONNX Runtime GenAI multimodal processor(...) flow.
  • Supports dynamic image grids through runtime image_grid_thw input.

Supported Checkpoints (validated locally with CPU EP)

  • Qwen/Qwen3-VL-2B-Instruct
  • Qwen/Qwen3-VL-4B-Instruct
  • Qwen/Qwen3-VL-8B-Instruct

Requirements

  • Python environment with:
    • onnxruntime-genai (built from source or installed)
    • transformers
    • huggingface_hub
    • torch
  • Local copy of pytorch_reference/modeling_qwen3_vl.py (already in this folder).

Quickstart

Run commands from the onnxruntime-genai repo root:

cd examples/python/qwen3-vl

Download the patched modeling_qwen3_vl.py from Hugging Face and place it into examples/python/qwen3-vl/pytorch_reference.

Also download builder.py and the inference script into examples/python/qwen3-vl:

mkdir ./pytorch_reference
hf download onnx-community/Qwen3-4B-VL-ONNX --include "modeling_qwen3_vl.py" --local-dir "./pytorch_reference"
hf download onnx-community/Qwen3-4B-VL-ONNX --include "builder.py" --local-dir "."
hf download onnx-community/Qwen3-4B-VL-ONNX --include "qwen3vl-oga-inference.py" --local-dir "."
hf download onnx-community/Qwen3-4B-VL-ONNX --include "test_images/*" --local-dir "./test_images"

1) Download a model from Hugging Face

Use either hf download (recommended) or huggingface-cli download.

Qwen3-VL-2B-Instruct

hf download Qwen/Qwen3-VL-2B-Instruct --local-dir "./pytorch_2b"

Qwen3-VL-4B-Instruct

hf download Qwen/Qwen3-VL-4B-Instruct --local-dir "./pytorch_4b"

Qwen3-VL-8B-Instruct

hf download Qwen/Qwen3-VL-8B-Instruct --local-dir "./pytorch_8b"

2) Export ONNX package

FP32 vision + FP32 text

& "python.exe" `
  "builder.py" `
  --input "./pytorch_4b" `
  --reference "./pytorch_reference" `
  --output "./qwen3-vl-4b-instruct-onnx-vision-fp32-text-fp32-cpu" `
  --precision fp32

FP32 vision + INT4 text

# 2B
& "python.exe" `
  "builder.py" `
  --input "./pytorch_2b" `
  --reference "./pytorch_reference" `
  --output "./qwen3-vl-2b-instruct-onnx-vision-fp32-text-int4-cpu" `
  --precision int4

# 4B
& "python.exe" `
  "builder.py" `
  --input "./pytorch_4b" `
  --reference "./pytorch_reference" `
  --output "./qwen3-vl-4b-instruct-onnx-vision-fp32-text-int4-cpu" `
  --precision int4

# 8B
& "python.exe" `
  "builder.py" `
  --input "./pytorch_8b" `
  --reference "./pytorch_reference" `
  --output "./qwen3-vl-8b-instruct-onnx-vision-fp32-text-int4-cpu" `
  --precision int4

3) Sanity test: text-only

Run from the same folder:

& "python.exe" `
  "qwen3vl-oga-inference.py" `
  -m "./qwen3-vl-8b-instruct-onnx-vision-fp32-text-int4-cpu" `
  -e follow_config `
  --non-interactive `
  -pr "Say hello in one short sentence."

Expected behavior: model loads and returns a short greeting (for example, Hello!).

4) Sanity test: image + text

& "python.exe" `
  "qwen3vl-oga-inference.py" `
  -m "./qwen3-vl-8b-instruct-onnx-vision-fp32-text-int4-cpu" `
  -e follow_config `
  --non-interactive `
  --image_paths "./test_images/img_50.jpg" `
  -pr "Describe this image in one sentence."

Expected behavior: model returns a one-sentence description for the image.

Notes

  • Current script usage is validated for single-image inference per call.
  • If you pass multiple images in one call, you may hit:
    • RuntimeError: Expected pixel_values in CHW format [C, H, W], got rank 4
  • Practical workaround: run one image per invocation.

Citation

If you use Qwen3-VL models, please cite the Qwen technical reports and model cards from the Qwen team.

Appendix: export patch log (precise)

This workflow uses a local reference model file at: examples/python/qwen3-vl/pytorch_reference/modeling_qwen3_vl.py

The exporter entrypoint is: examples/python/qwen3-vl/builder.py (not export_for_oga.py) with --reference ./pytorch_reference to load the patched class dynamically.

A) modeling_qwen3_vl.py patches for dynamic ONNX export

  1. Vision attention tracing path

    • Qwen3VLVisionAttention.forward: tracing branch bypasses Python split/loop path (lengths.tolist(), torch.split(...), per-chunk iteration) and runs tensor-only attention flow.
  2. RoPE tracing path

    • Qwen3VLVisionModel.rot_pos_emb: tracing branch computes rotary positions with tensor math only and requires token_count.
  3. Position embedding tracing fallback

    • Qwen3VLVisionModel.forward: tracing branch uses tensor-index fallback for positional embeddings instead of interpolation-heavy Python control flow.

B) Additional 4 points (explicit)

  1. cu_seqlens dtype split for export/runtime

    • Qwen3VLVisionModel.forward: cu_seqlens uses grid_thw.dtype during tracing, torch.int32 otherwise.
  2. Tracing-safe image feature output

    • Qwen3VLModel.get_image_features: tracing returns raw tensors and skips torch.split(..., split_sizes.tolist()).
  3. Tracing-safe cache behavior

    • Qwen3VLTextModel.forward: DynamicCache is not created while tracing.
  4. Model-size portability in exporter

    • builder.py embedding export no longer hardcodes hidden width (2560); it uses embed_tokens.embedding_dim, enabling 2B/4B/8B export.

These changes keep eager/runtime behavior intact while adding an export-safe tensor-only tracing path for dynamic image_grid_thw.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support