---
license: apache-2.0
base_model: gemma-3
language:
- en
tags:
- pytorch
- causal-lm
- tinystories
- small-language-model
pipeline_tag: text-generation
---

# Gemma3 270M - TinyStories

This is a small language model (270M parameters) based on the Gemma3 architecture, trained on the TinyStories dataset.

## Model Details

- **Architecture**: Gemma3 with sliding window attention
- **Parameters**: ~270M
- **Training Data**: TinyStories dataset
- **Context Length**: 32,768 tokens
- **Vocabulary Size**: 50,257

## Usage

```python
import torch
from transformers import GPT2Tokenizer
# You'll need to copy the model class definition or import it

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
# Load model weights
model = Gemma3Model(GEMMA3_CONFIG_270M)
model.load_state_dict(torch.load("pytorch_model.bin"))

# Generate text
prompt = "Once upon a time"
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
```

## Training Details

- **Optimizer**: AdamW with weight decay
- **Learning Rate**: 1e-4 with cosine annealing
- **Batch Size**: 32
- **Context Window**: 128 tokens
- **Total Iterations**: 150,000

This model was trained from scratch using PyTorch and is designed for creative text generation tasks.