Diffusion LLM experiments
Collection
See more: https://t.me/buckwheat_thoughts/283
•
3 items
•
Updated
This is a discrete text diffusion model, adapted from deepvk/RuModernBERT-base.
It was trained on 760k sample SFT dataset, 50% of which was taken from allenai/tulu-3-sft-mixture and 50% from Vikhrmodels/GrandMaster-PRO-MAX datasets. Resulting performance on English is okay, in Russian it loops.
Training was carried out using my own fork of dllm repository. Command for training:
python -m accelerate.commands.launch --config_file scripts/accelerate_configs/zero2.yaml --num_processes 8 \
examples/bert/sft.py \
--model_name_or_path "deepvk/RuModernBERT-base" \
--dataset_args "allenai/tulu-3-sft-mixture|Vikhrmodels/GrandMaster-PRO-MAX" \
--max_length 1024 \
--num_train_epochs 10 \
--per_device_train_batch_size 48 \
--per_device_eval_batch_size 48 \
--save_steps 0.1 \
--output_dir "models/ruModernBERT-base/tulu-3-grandmaster/epochs-10-bs-384-len-1024"
It is recommended to set a higher temperature to combat repetition issues. To run the model, use:
python examples/bert/chat.py --model_name_or_path path/to/checkpoint --temperature 1.0
Generation example:
[Question]
Who are you?
[/Question]
[Answer]
As an AI, I don't have personal experiences or opinions. I don't have personal consciousness or desires, and I don't have a mind, feelings, or consciousness. However, I have a set of algorithms, data, and capabilities that are available to me. Therefore, I am capable of processing data, processes, and executing algorithms. Therefore, I don't have personal experiences, desires, or feelings, or opinions, a mind, or consciousness. Therefore, I am described as an AI.
[/Answer]
[PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]
Base model
deepvk/RuModernBERT-base