dineshkvr commited on
Commit
668ee5b
·
verified ·
1 Parent(s): 3996734

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: gemma-3
4
+ language:
5
+ - en
6
+ tags:
7
+ - pytorch
8
+ - causal-lm
9
+ - tinystories
10
+ - small-language-model
11
+ pipeline_tag: text-generation
12
+ ---
13
+
14
+ # Gemma3 270M - TinyStories
15
+
16
+ This is a small language model (270M parameters) based on the Gemma3 architecture, trained on the TinyStories dataset.
17
+
18
+ ## Model Details
19
+
20
+ - **Architecture**: Gemma3 with sliding window attention
21
+ - **Parameters**: ~270M
22
+ - **Training Data**: TinyStories dataset
23
+ - **Context Length**: 32,768 tokens
24
+ - **Vocabulary Size**: 50,257
25
+
26
+ ## Usage
27
+
28
+ ```python
29
+ import torch
30
+ from transformers import GPT2Tokenizer
31
+ # You'll need to copy the model class definition or import it
32
+
33
+ tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
34
+ # Load model weights
35
+ model = Gemma3Model(GEMMA3_CONFIG_270M)
36
+ model.load_state_dict(torch.load("pytorch_model.bin"))
37
+
38
+ # Generate text
39
+ prompt = "Once upon a time"
40
+ inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
41
+ outputs = model.generate(inputs, max_new_tokens=100)
42
+ print(tokenizer.decode(outputs[0]))
43
+ ```
44
+
45
+ ## Training Details
46
+
47
+ - **Optimizer**: AdamW with weight decay
48
+ - **Learning Rate**: 1e-4 with cosine annealing
49
+ - **Batch Size**: 32
50
+ - **Context Window**: 128 tokens
51
+ - **Total Iterations**: 150,000
52
+
53
+ This model was trained from scratch using PyTorch and is designed for creative text generation tasks.