Instructions to use LiquidAI/LFM2-VL-450M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LiquidAI/LFM2-VL-450M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="LiquidAI/LFM2-VL-450M") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("LiquidAI/LFM2-VL-450M") model = AutoModelForImageTextToText.from_pretrained("LiquidAI/LFM2-VL-450M") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use LiquidAI/LFM2-VL-450M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LiquidAI/LFM2-VL-450M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2-VL-450M", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/LiquidAI/LFM2-VL-450M
- SGLang
How to use LiquidAI/LFM2-VL-450M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2-VL-450M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2-VL-450M", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2-VL-450M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2-VL-450M", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use LiquidAI/LFM2-VL-450M with Docker Model Runner:
docker model run hf.co/LiquidAI/LFM2-VL-450M
ValueError: Image features and image tokens do not match: tokens: 9728, features 10240 mb
I have been trying to replicate the SFT notebook on a dataset of mine.
The dataset has very large panoramic images.
I keep getting the error above, regardless of the sizing of sequence lengths, maximum image tokens.
import asyncio
import os
import torch
import transformers
import trl
from peft import LoraConfig
from transformers import (
AutoModelForImageTextToText,
AutoProcessor,
EarlyStoppingCallback,
IntervalStrategy
)
from trl import SFTConfig, SFTTrainer
from common.dataset import load_isiqa_2019_dataset
os.environ["WANDB_DISABLED"] = "true"
if __name__ == "__main__":
processor = AutoProcessor.from_pretrained(
"LiquidAI/LFM2-VL-450M",
trust_remote_code=True,
do_image_splitting=True,
min_image_tokens=64,
max_image_tokens=256,
)
model = AutoModelForImageTextToText.from_pretrained(
"LiquidAI/LFM2-VL-450M",
trust_remote_code=True,
device_map="cuda:0",
torch_dtype="bfloat16",
)
dataset = asyncio.run(load_isiqa_2019_dataset())
def get_collator(processor):
def collator(x):
batch = processor.apply_chat_template(
x, tokenize=True, padding=True, return_dict=True, return_tensors="pt"
)
y = batch["input_ids"].clone()
y[y == processor.tokenizer.pad_token_id] = -100
batch["labels"] = y
return batch
return collator
config = SFTConfig(
output_dir=os.path.join("experiments", "lfm2-piqa"),
num_train_epochs=4,
per_device_train_batch_size=1,
gradient_accumulation_steps=32,
learning_rate=5e-4,
warmup_ratio=0.1,
weight_decay=0.01,
logging_steps=10,
optim="adamw_torch_8bit",
gradient_checkpointing=True,
max_length=1024,
eval_strategy=IntervalStrategy.STEPS,
eval_steps=10,
dataset_kwargs={"skip_prepare_dataset": True},
report_to=["trackio"],
metric_for_best_model="eval_loss",
load_best_model_at_end=True,
)
trainer = SFTTrainer(
model=model,
args=config,
train_dataset=dataset["training"],
eval_dataset=dataset["validation"],
data_collator=get_collator(processor),
processing_class=processor.tokenizer,
# compute_metrics=compute_metrics, # TODO
peft_config=LoraConfig(
lora_alpha=64,
lora_dropout=0.05,
r=64,
bias="none",
target_modules=[
"q_proj",
"v_proj",
"fc1",
"fc2",
"linear",
"gate_proj",
"up_proj",
"down_proj",
],
task_type="CAUSAL_LM",
),
callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],
)
trainer.train()
model = model.merge_and_unload()
model.save_pretrained(config.output_dir)
processor.save_pretrained(config.output_dir)
It might be that your context length is too short, so you’re cutting off “mid-image.” If you’re training on very large images with do_image_splitting=True, keep in mind that a single image can be ~2k tokens. What I would suggest as a first attempt is to (1) try extending your context length OR (2) disable image splitting in the processor args.
I have tried it all, as I said above.
The example notebook works.
But swapping dataset raises it.
On average, panoramic images are very large, and with splitting are split into six to eight tensors.