SynthAlign-7B / README.md

pdsdpo

Update README.md

ea69340 verified about 1 year ago

preview code

raw

history blame

1.24 kB

metadata

license: apache-2.0
datasets:
  - pdsdpo/pdsdpo-v1_0-data
language:
  - en
pipeline_tag: image-text-to-text
library_name: transformers

PDS-DPO-7B Model Card

GitHub | arXiv

PDS-DPO-7B is a vision-language model built upon LLaVA 1.5 7B and trained using the proposed Preference Data Synthetic Direct Preference Optimization (PDS-DPO) framework. This approach leverages synthetic data generated using generative and reward models as proxies for human preferences to improve alignment, reduce hallucinations, and enhance reasoning capabilities.

Model Details

Model Name: PDS-DPO-7B
Base Model: LLaVA 1.5 (Vicuna-7B)
Framework: Preference Data Synthetic Alignment using Direct Preference Optimization (PDS-DPO)
Dataset: 9K synthetic image-text pairs (positive and negative responses), generated via Stable Diffusion, LLaVA, and scored by reward models like ImageReward and Llama-3-8B-ArmoRM.
Training Hardware: 2 × A100 GPUs
Training Optimization: LoRA fine-tuning, DeepSpeed (Zero-2 strategy)

Key Features

Synthetic Data Alignment
Improved Hallucination Control
Competitive Benchmark Performance

Examples

Citation

@article{2024pdsdpo
title={},
author={},
journal={},
year={}
}

PDS-DPO-7B Model Card

Model Details

Key Features

Synthetic Data Alignment

Improved Hallucination Control

Examples

Citation