Gemma3 270M - TinyStories

This is a small language model (270M parameters) based on the Gemma3 architecture, trained on the TinyStories dataset.

Model Details

  • Architecture: Gemma3 with sliding window attention
  • Parameters: ~270M
  • Training Data: TinyStories dataset
  • Context Length: 32,768 tokens
  • Vocabulary Size: 50,257

Usage

import torch
from transformers import GPT2Tokenizer
# You'll need to copy the model class definition or import it

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
# Load model weights
model = Gemma3Model(GEMMA3_CONFIG_270M)
model.load_state_dict(torch.load("pytorch_model.bin"))

# Generate text
prompt = "Once upon a time"
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))

Training Details

  • Optimizer: AdamW with weight decay
  • Learning Rate: 1e-4 with cosine annealing
  • Batch Size: 32
  • Context Window: 128 tokens
  • Total Iterations: 150,000

This model was trained from scratch using PyTorch and is designed for creative text generation tasks.

Downloads last month
49
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support