This model was distilled using only SFT, or through a combination of SFT and RL?
#23
by
wizardII
- opened
Hi, thank you for releasing this model! While using it, I noticed that it seems to exhibit some characteristics similar to RL-trained models. May I kindly ask whether this model was distilled using only SFT, or through a combination of SFT and RL?