An Empirical Study of DPO Configuration Choices for LLM Alignment
-
jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_hh-rlhf
Text Generation • Updated -
jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_oasst1
Text Generation • Updated • 2 -
jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_PKU-SafeRLHF
Text Generation • Updated -
jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_ultrafeedback
Text Generation • Updated • 1