--- base_model: manifestasi/smolVLM-300M-manifestasi-v2 library_name: transformers model_name: trainer_output tags: - generated_from_trainer - dpo - trl licence: license license: apache-2.0 datasets: - trl-lib/rlaif-v language: - en pipeline_tag: image-to-text --- # Model For Research Purpose Only # Model Card for trainer_output This model is a fine-tuned version of [manifestasi/RetinaVLM-300M](https://huggingface.co/manifestasi/RetinaVLM-300M). It has been trained using [TRL](https://github.com/huggingface/trl). ## Quick start ```python %%capture !pip install -U bitsandbytes from transformers import AutoProcessor, AutoModelForImageTextToText import torch DEVICE = "cuda" if torch.cuda.is_available() else "cpu" processor = AutoProcessor.from_pretrained("manifestasi/RetinaVLM-300M-DPO") model = AutoModelForImageTextToText.from_pretrained("manifestasi/RetinaVLM-300M-DPO", torch_dtype=torch.float16, _attn_implementation="eager").to(DEVICE) from PIL import Image from transformers.image_utils import load_image # Load images # image1 = load_image("https://huggingface.co/spaces/HuggingFaceTB/SmolVLM/resolve/main/example_images/rococo.jpg") image2 = load_image("/kaggle/input/bandaraaa/799269_1200.jpg") # Create input messages messages = [ { "role": "user", "content": [ # {"type": "image"}, {"type": "image"}, {"type": "text", "text": """ Instructions : you are visual assistant for blind people, please answer politely and short under 100 words. Prompt : can you direct me to find toilet """} ] }, ] # Prepare inputs prompt = processor.apply_chat_template(messages, add_generation_prompt=True) # inputs = processor(text=prompt, return_tensors="pt") inputs = processor(text=prompt, images=[image2], return_tensors="pt") inputs = inputs.to(DEVICE) # Generate outputs from time import time tim1 = time() generated_ids = model.generate(**inputs, max_new_tokens=120) generated_texts = processor.batch_decode( generated_ids, skip_special_tokens=True, ) tim2 = time() print(f"{(tim2 - tim1)} detik") print(generated_texts[0].split("Assistant: ")[1]) ``` ## Training procedure [Visualize in Weights & Biases](https://wandb.ai/awscloudrock-rockio/huggingface/runs/m54r7mjv) This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290). ### Framework versions - TRL: 0.19.0 - Transformers: 4.53.0 - Pytorch: 2.6.0+cu124 - Datasets: 3.6.0 - Tokenizers: 0.21.2 ## Citations Cite DPO as: ```bibtex @inproceedings{rafailov2023direct, title = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}}, author = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn}, year = 2023, booktitle = {Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023}, url = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html}, editor = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine}, } ``` Cite TRL as: ```bibtex @misc{vonwerra2022trl, title = {{TRL: Transformer Reinforcement Learning}}, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec}, year = 2020, journal = {GitHub repository}, publisher = {GitHub}, howpublished = {\url{https://github.com/huggingface/trl}} } ```