See axolotl config

axolotl version: 0.13.0.dev0

base_model: Qwen/Qwen3-8B
# Automatically upload checkpoint and final model to HF
hub_model_id: okolukisa1/Qwen3-8B-seq-r-1

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: okolukisa1/seq-r-1
    type: alpaca
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/out

sequence_len: 4096
sample_packing: true
eval_sample_packing: true


adapter: qlora
lora_model_dir:
lora_r: 32
lora_alpha: 64
lora_dropout: 0.05
lora_target_linear: true

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.0002

bf16: auto
tf32: true

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
resume_from_checkpoint:
logging_steps: 1
flash_attention: true

warmup_ratio: 0.1
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.0

special_tokens:

# save_first_step: true  # uncomment this to validate checkpoint saving works with your config

Qwen3-8B-seq-r-1

This model is a fine-tuned version of Qwen/Qwen3-8B on the okolukisa1/seq-r-1 dataset. It achieves the following results on the evaluation set:

Loss: 0.4283
Memory/max Active (gib): 12.88
Memory/max Allocated (gib): 12.88
Memory/device Reserved (gib): 17.11

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 4
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 62
training_steps: 623

Training results

Training Loss	Epoch	Step	Validation Loss	Active (gib)	Allocated (gib)	Reserved (gib)
No log	0	0	1.1547	12.2	12.2	12.27
0.4889	0.2497	156	0.4875	12.88	12.88	17.5
0.4674	0.4994	312	0.4510	12.88	12.88	17.11
0.4177	0.7491	468	0.4283	12.88	12.88	17.11

Framework versions

PEFT 0.18.0
Transformers 4.57.1
Pytorch 2.8.0+cu128
Datasets 4.4.1
Tokenizers 0.22.1

Downloads last month: 22

Model tree for okolukisa1/Qwen3-8B-seq-r-1

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Adapter

(409)

this model

okolukisa1
/

Qwen3-8B-seq-r-1

Qwen3-8B-seq-r-1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for okolukisa1/Qwen3-8B-seq-r-1

Dataset used to train okolukisa1/Qwen3-8B-seq-r-1

Evaluation results