Instructions to use LiquidAI/LFM2-VL-450M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LiquidAI/LFM2-VL-450M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="LiquidAI/LFM2-VL-450M")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("LiquidAI/LFM2-VL-450M")
model = AutoModelForImageTextToText.from_pretrained("LiquidAI/LFM2-VL-450M")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use LiquidAI/LFM2-VL-450M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LiquidAI/LFM2-VL-450M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2-VL-450M",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/LiquidAI/LFM2-VL-450M

SGLang

How to use LiquidAI/LFM2-VL-450M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LiquidAI/LFM2-VL-450M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2-VL-450M",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LiquidAI/LFM2-VL-450M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2-VL-450M",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use LiquidAI/LFM2-VL-450M with Docker Model Runner:
```
docker model run hf.co/LiquidAI/LFM2-VL-450M
```

ValueError: Image features and image tokens do not match: tokens: 9728, features 10240 mb

by DiTo97 - opened Aug 15, 2025

Discussion

DiTo97

Aug 15, 2025

•

edited Aug 15, 2025

I have been trying to replicate the SFT notebook on a dataset of mine.

The dataset has very large panoramic images.

I keep getting the error above, regardless of the sizing of sequence lengths, maximum image tokens.

import asyncio
import os

import torch
import transformers
import trl
from peft import LoraConfig
from transformers import (
    AutoModelForImageTextToText, 
    AutoProcessor, 
    EarlyStoppingCallback, 
    IntervalStrategy
)
from trl import SFTConfig, SFTTrainer

from common.dataset import load_isiqa_2019_dataset


os.environ["WANDB_DISABLED"] = "true"


if __name__ == "__main__":
    processor = AutoProcessor.from_pretrained(
        "LiquidAI/LFM2-VL-450M",
        trust_remote_code=True,
        do_image_splitting=True,
        min_image_tokens=64,
        max_image_tokens=256,
    )

    model = AutoModelForImageTextToText.from_pretrained(
        "LiquidAI/LFM2-VL-450M",
        trust_remote_code=True,
        device_map="cuda:0",
        torch_dtype="bfloat16",
    )

    dataset = asyncio.run(load_isiqa_2019_dataset())

    def get_collator(processor):
        def collator(x):
            batch = processor.apply_chat_template(
                x, tokenize=True, padding=True, return_dict=True, return_tensors="pt"
            )

            y = batch["input_ids"].clone()
            y[y == processor.tokenizer.pad_token_id] = -100

            batch["labels"] = y
            return batch
        return collator

    config = SFTConfig(
        output_dir=os.path.join("experiments", "lfm2-piqa"),
        num_train_epochs=4,
        per_device_train_batch_size=1,
        gradient_accumulation_steps=32,
        learning_rate=5e-4,
        warmup_ratio=0.1,
        weight_decay=0.01,
        logging_steps=10,
        optim="adamw_torch_8bit",
        gradient_checkpointing=True,
        max_length=1024,
        eval_strategy=IntervalStrategy.STEPS,
        eval_steps=10,
        dataset_kwargs={"skip_prepare_dataset": True},
        report_to=["trackio"],
        metric_for_best_model="eval_loss",
        load_best_model_at_end=True,
    )

    trainer = SFTTrainer(
        model=model,
        args=config,
        train_dataset=dataset["training"],
        eval_dataset=dataset["validation"],
        data_collator=get_collator(processor),
        processing_class=processor.tokenizer,
        # compute_metrics=compute_metrics,  # TODO
        peft_config=LoraConfig(
            lora_alpha=64,
            lora_dropout=0.05,
            r=64,
            bias="none",
            target_modules=[
                "q_proj", 
                "v_proj", 
                "fc1", 
                "fc2", 
                "linear",
                "gate_proj", 
                "up_proj", 
                "down_proj",
            ],
            task_type="CAUSAL_LM",
        ),
        callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],
    )

    trainer.train()

    model = model.merge_and_unload()

    model.save_pretrained(config.output_dir)  
    processor.save_pretrained(config.output_dir)

EdoardoMosca

Liquid AI org Aug 15, 2025

It might be that your context length is too short, so you’re cutting off “mid-image.” If you’re training on very large images with do_image_splitting=True, keep in mind that a single image can be ~2k tokens. What I would suggest as a first attempt is to (1) try extending your context length OR (2) disable image splitting in the processor args.

DiTo97

Aug 15, 2025

•

edited Aug 15, 2025

I have tried it all, as I said above.

The example notebook works.
But swapping dataset raises it.

On average, panoramic images are very large, and with splitting are split into six to eight tensors.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment