--- license: apache-2.0 datasets: - pdsdpo/pdsdpo-v1_0-data language: - en pipeline_tag: image-text-to-text library_name: transformers --- # PDS-DPO-7B Model Card GitHub | arXiv PDS-DPO-7B is a vision-language model built upon LLaVA 1.5 7B and trained using the proposed Preference Data Synthetic Direct Preference Optimization (PDS-DPO) framework. This approach leverages synthetic data generated using generative and reward models as proxies for human preferences to improve alignment, reduce hallucinations, and enhance reasoning capabilities. ## Model Details - Model Name: PDS-DPO-7B - Base Model: LLaVA 1.5 (Vicuna-7B) - Framework: Preference Data Synthetic Alignment using Direct Preference Optimization (PDS-DPO) - Dataset: 9K synthetic image-text pairs (positive and negative responses), generated via Stable Diffusion, LLaVA, and scored by reward models like ImageReward and Llama-3-8B-ArmoRM. - Training Hardware: 2 × A100 GPUs - Training Optimization: LoRA fine-tuning, DeepSpeed (Zero-2 strategy) ## Key Features - Synthetic Data Alignment - - Improved Hallucination Control - - Competitive Benchmark Performance ## Examples ## Citation ```bibtex @article{2024pdsdpo title={}, author={}, journal={}, year={} } ```