HH-BTRewardModel-roberta

Bradley-Terry reward model trained on Anthropic/hh-rlhf dataset, using log-sigmoid loss:

L=βˆ’log⁑σ(rchosenβˆ’rrejected)=log⁑(1+exp⁑(βˆ’(rchosenβˆ’rrejected))) \mathcal{L} = -\log \sigma(r_{\text{chosen}} - r_{\text{rejected}}) = \log(1 + \exp(-(r_{\text{chosen}} - r_{\text{rejected}})))

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("JERRYPAN617/HH-BTRewardModel-roberta")
model = AutoModelForSequenceClassification.from_pretrained("JERRYPAN617/HH-BTRewardModel-roberta")

text = "Your text here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    reward = model(**inputs).logits.item()
Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Video Preview
loading

Model tree for JERRYPAN617/HH-BTRewardModel-roberta

Finetuned
(2083)
this model

Dataset used to train JERRYPAN617/HH-BTRewardModel-roberta