dineshkvr
/

gemma3-270m-tinystories-gpt2-tokenizer

+---
+license: apache-2.0
+base_model: gemma-3
+language:
+- en
+tags:
+- pytorch
+- causal-lm
+- tinystories
+- small-language-model
+pipeline_tag: text-generation
+---
+# Gemma3 270M - TinyStories
+This is a small language model (270M parameters) based on the Gemma3 architecture, trained on the TinyStories dataset.
+## Model Details
+- **Architecture**: Gemma3 with sliding window attention
+- **Parameters**: ~270M
+- **Training Data**: TinyStories dataset
+- **Context Length**: 32,768 tokens
+- **Vocabulary Size**: 50,257
+## Usage
+```python
+import torch
+from transformers import GPT2Tokenizer
+# You'll need to copy the model class definition or import it
+tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
+# Load model weights
+model = Gemma3Model(GEMMA3_CONFIG_270M)
+model.load_state_dict(torch.load("pytorch_model.bin"))
+# Generate text
+prompt = "Once upon a time"
+inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
+outputs = model.generate(inputs, max_new_tokens=100)
+print(tokenizer.decode(outputs[0]))
+```
+## Training Details
+- **Optimizer**: AdamW with weight decay
+- **Learning Rate**: 1e-4 with cosine annealing
+- **Batch Size**: 32
+- **Context Window**: 128 tokens
+- **Total Iterations**: 150,000
+This model was trained from scratch using PyTorch and is designed for creative text generation tasks.