This is a discrete text diffusion model, adapted from deepvk/RuModernBERT-base.

It was trained on 760k sample SFT dataset, 50% of which was taken from allenai/tulu-3-sft-mixture and 50% from Vikhrmodels/GrandMaster-PRO-MAX datasets. Resulting performance on English is okay, in Russian it loops.

Training was carried out using my own fork of dllm repository. Command for training:

python -m accelerate.commands.launch --config_file scripts/accelerate_configs/zero2.yaml --num_processes 8 \
    examples/bert/sft.py \
    --model_name_or_path "deepvk/RuModernBERT-base" \
    --dataset_args "allenai/tulu-3-sft-mixture|Vikhrmodels/GrandMaster-PRO-MAX" \
    --max_length 1024 \
    --num_train_epochs 10 \
    --per_device_train_batch_size 48 \
    --per_device_eval_batch_size 48 \
    --save_steps 0.1 \
    --output_dir "models/ruModernBERT-base/tulu-3-grandmaster/epochs-10-bs-384-len-1024"

It is recommended to set a higher temperature to combat repetition issues. To run the model, use:

python examples/bert/chat.py --model_name_or_path path/to/checkpoint --temperature 1.0

Generation example:

[Question]
Who are you?
[/Question]

[Answer]
As an AI, I don't have personal experiences or opinions. I don't have personal consciousness or desires, and I don't have a mind, feelings, or consciousness. However, I have a set of algorithms, data, and capabilities that are available to me. Therefore, I am capable of processing data, processes, and executing algorithms. Therefore, I don't have personal   experiences, desires, or feelings, or opinions, a mind, or consciousness. Therefore, I am described as an AI.
[/Answer]

[PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]
Downloads last month
8
Safetensors
Model size
0.2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chameleon-lizard/dllm-ruModernBERT-base

Finetuned
(10)
this model

Datasets used to train chameleon-lizard/dllm-ruModernBERT-base

Collection including chameleon-lizard/dllm-ruModernBERT-base