--- license: apache-2.0 base_model: gemma-3 language: - en tags: - pytorch - causal-lm - tinystories - small-language-model pipeline_tag: text-generation --- # Gemma3 270M - TinyStories This is a small language model (270M parameters) based on the Gemma3 architecture, trained on the TinyStories dataset. ## Model Details - **Architecture**: Gemma3 with sliding window attention - **Parameters**: ~270M - **Training Data**: TinyStories dataset - **Context Length**: 32,768 tokens - **Vocabulary Size**: 50,257 ## Usage ```python import torch from transformers import GPT2Tokenizer # You'll need to copy the model class definition or import it tokenizer = GPT2Tokenizer.from_pretrained("gpt2") # Load model weights model = Gemma3Model(GEMMA3_CONFIG_270M) model.load_state_dict(torch.load("pytorch_model.bin")) # Generate text prompt = "Once upon a time" inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt") outputs = model.generate(inputs, max_new_tokens=100) print(tokenizer.decode(outputs[0])) ``` ## Training Details - **Optimizer**: AdamW with weight decay - **Learning Rate**: 1e-4 with cosine annealing - **Batch Size**: 32 - **Context Window**: 128 tokens - **Total Iterations**: 150,000 This model was trained from scratch using PyTorch and is designed for creative text generation tasks.