See axolotl config
axolotl version: 0.13.0.dev0
adapter: qlora
base_model: NousResearch/Meta-Llama-3-8B-Instruct
bf16: auto
chat_template: llama3
dataset_prepared_path: last_run_prepared
datasets:
- path: hfht/peft-for-humor-train-dataset
split: train
type: chat_template
do_causal_lm_eval: false
evals_per_epoch: 1
flash_attention: true
gradient_accumulation_steps: 4
gradient_checkpointing: true
learning_rate: 0.0006
load_in_4bit: true
load_in_8bit: false
logging_steps: 1
lora_alpha: 64
lora_dropout: 0
lora_mlp_kernel: true
lora_model_dir: null
lora_modules_to_save:
- embed_tokens
- lm_head
lora_o_kernel: true
lora_qkv_kernel: true
lora_r: 64
lora_target_linear: true
lr_scheduler: cosine
micro_batch_size: 2
num_epochs: 4
optimizer: adamw_bnb_8bit
output_dir: ./out/loras/humor
resume_from_checkpoint: null
sample_packing: true
save_strategy: 'no'
save_total_limit: 1
saves_per_epoch: 0
sequence_len: 4096
special_tokens:
pad_token: <|finetune_right_pad_id|>
tf32: false
train_on_inputs: false
val_set_size: 0.005
warmup_steps: 5
weight_decay: 0.0
out/loras/humor
This model is a fine-tuned version of NousResearch/Meta-Llama-3-8B-Instruct on the hfht/peft-for-humor-train-dataset dataset. It achieves the following results on the evaluation set:
- Loss: 2.9864
- Memory/max Active (gib): 18.36
- Memory/max Allocated (gib): 18.36
- Memory/device Reserved (gib): 32.49
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0006
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 8
- optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 5
- training_steps: 32
Training results
| Training Loss | Epoch | Step | Validation Loss | Active (gib) | Allocated (gib) | Reserved (gib) |
|---|---|---|---|---|---|---|
| No log | 0 | 0 | 4.1906 | 13.02 | 13.02 | 13.12 |
| 2.9843 | 0.9143 | 8 | 2.9981 | 18.36 | 18.36 | 32.49 |
| 1.6393 | 1.8 | 16 | 3.0203 | 18.36 | 18.36 | 32.49 |
| 0.9789 | 2.6857 | 24 | 2.9846 | 18.36 | 18.36 | 32.49 |
| 0.6715 | 3.5714 | 32 | 2.9864 | 18.36 | 18.36 | 32.49 |
Framework versions
- PEFT 0.18.0
- Transformers 4.57.1
- Pytorch 2.8.0+cu128
- Datasets 4.4.1
- Tokenizers 0.22.1
- Downloads last month
- 12
Model tree for hfht/Meta-Llama-3-8B-Instruct-peft-for-humor
Base model
NousResearch/Meta-Llama-3-8B-Instruct