AvaLovelace
/

BrickGPT

Model card Files Files and versions

AvaLovelace commited on May 6

Commit

aaddae8

·

verified ·

1 Parent(s): b19550b

Update README.md

Files changed (1) hide show

README.md +12 -16

README.md CHANGED Viewed

@@ -26,15 +26,15 @@ pipeline_tag: text-generation
 - **Funded by [optional]:** [More Information Needed]
 - **Shared by [optional]:** [More Information Needed]
 - **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
 - **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
 - **Paper [optional]:** [More Information Needed]
 - **Demo [optional]:** [More Information Needed]
@@ -88,22 +88,18 @@ Use the code below to get started with the model.
 ### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
 #### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation

 - **Funded by [optional]:** [More Information Needed]
 - **Shared by [optional]:** [More Information Needed]
 - **Model type:** [More Information Needed]
+- **Language(s) (NLP):** English
 - **License:** [More Information Needed]
+- **Finetuned from model:** [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
+- **Repository:** [AvaLovelace1/LegoGPT](https://github.com/AvaLovelace1/LegoGPT)
 - **Paper [optional]:** [More Information Needed]
 - **Demo [optional]:** [More Information Needed]
 ### Training Procedure
+The model was fine-tuned using LoRA applied to the `q_proj` and `v_proj` matrices. We used AdamW optimization. The learning rate followed a cosine decay with warmup.
 #### Training Hyperparameters
+- **Training regime:** bf16 mixed precision
+- **Epochs:** 3
+- **Global batch size:** 64
+- **Max learning rate:** 0.002
+- **Learning rate warmup steps:** 100
+- **LoRA rank:** 32
+- **LoRA alpha:** 16
+- **LoRA dropout:** 0.05
 ## Evaluation