Gemma3 270M - TinyStories
This is a small language model (270M parameters) based on the Gemma3 architecture, trained on the TinyStories dataset.
Model Details
- Architecture: Gemma3 with sliding window attention
- Parameters: ~270M
- Training Data: TinyStories dataset
- Context Length: 32,768 tokens
- Vocabulary Size: 50,257
Usage
import torch
from transformers import GPT2Tokenizer
# You'll need to copy the model class definition or import it
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
# Load model weights
model = Gemma3Model(GEMMA3_CONFIG_270M)
model.load_state_dict(torch.load("pytorch_model.bin"))
# Generate text
prompt = "Once upon a time"
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
Training Details
- Optimizer: AdamW with weight decay
- Learning Rate: 1e-4 with cosine annealing
- Batch Size: 32
- Context Window: 128 tokens
- Total Iterations: 150,000
This model was trained from scratch using PyTorch and is designed for creative text generation tasks.
- Downloads last month
- 49