--- base_model: - TinyLlama/TinyLlama-1.1B-Chat-v1.0 --- ## Model Overview This is a fine-tuned version of the Llama model trained using the ORPO (Optimized Ranked Preference Ordering) dataset (mlabonne/orpo-dpo-mix-40k) to enhance conversational and preference-based response generation. The model uses the LoRA (Low-Rank Adaptation) technique to achieve efficient adaptation with minimal additional parameters, allowing it to learn task-specific knowledge without extensive computational demands. ## Hyperparameters - LoRA Configuration: r=8, - lora_alpha=16, - lora_dropout=0.1 ## ORPO Trainer Configuration: - Learning Rate: 1e-5 - Max Length: 2048 - Batch Size: 1 - Epochs: 1 ## Model Performance The model was evaluated on the hellaswag task, yielding the following metrics: - Accuracy: 46.59% - Normalized Accuracy: 60.43%