Recursive Language Model - 48M
A transformer-based language model with adaptive recursive processing mechanism for enhanced text generation.
Model Description
This model implements a Recursive Language Model architecture that uses a router network to dynamically determine the optimal number of refinement passes for each input. This adaptive computation approach allows the model to allocate more processing to complex inputs while being efficient on simpler ones.
Key Innovation: Unlike standard transformers that process all inputs uniformly, this model learns when to "think harder" through additional recursion steps.
Quick Start
Installation
pip install transformers torch
Basic Usage
from transformers import AutoModelForCausalLM, GPT2Tokenizer
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"Girinath11/recursive-language-model-48m",
trust_remote_code=True
)
tokenizer = GPT2Tokenizer.from_pretrained("Girinath11/recursive-language-model-48m")
# Generate text
prompt = "The future of artificial intelligence"
input_ids = tokenizer.encode(prompt, return_tensors="pt")
outputs = model.generate(
input_ids,
max_new_tokens=50,
temperature=0.8,
do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Model Details
Architecture
| Component | Value |
|---|---|
| Parameters | 47,931,907 (~48M) |
| Vocabulary | 50,257 tokens (GPT-2) |
| Embedding Dimension | 512 |
| Transformer Layers | 6 base layers |
| Attention Heads | 8 |
| Max Recursion Steps | 2 |
| Context Length | 256 tokens |
| Positional Encoding | Learned embeddings |
Architecture Components
- Token & Position Embeddings - Input representation layer
- Main Transformer Stack - 6 standard transformer encoder layers with causal masking
- Recursion Depth Router - Lightweight classifier that predicts optimal recursion depth
- Recursive Processing Layer - Reusable transformer layer for refinement
- Language Model Head - Projects to vocabulary with weight tying to embeddings
The router network uses soft weighting to blend outputs from different recursion depths, making the model differentiable end-to-end.
Training Details
Dataset
- Total Samples: 100,000 text documents
- Training Split: 95,000 samples (95%)
- Validation Split: 5,000 samples (5%)
- Tokenizer: GPT-2 tokenizer (50,257 vocab)
Training Configuration
Hardware:
GPU: NVIDIA T4 (16 GB)
Mixed Precision: FP16 (AMP)
Hyperparameters:
Batch Size: 32
Gradient Accumulation: 4
Effective Batch Size: 128
Learning Rate: 5e-4
Optimizer: AdamW
Weight Decay: 0.1
LR Scheduler: OneCycleLR (cosine)
Total Epochs: 8
Sequence Length: 256
Regularization:
Dropout Rate: 0.1
Training Time
- Total Duration: 2.24 hours
- Time per Epoch: ~16 minutes
- Training Speed: 3.10 iterations/second
Training Progression
| Epoch | Training Loss | Validation Loss | Perplexity |
|---|---|---|---|
| 1 | 7.38 | 6.01 | 406.28 |
| 2 | 5.50 | 4.97 | 143.59 |
| 3 | 4.72 | 4.43 | 84.06 |
| 4 | 4.28 | 4.15 | 63.62 |
| 5 | 4.01 | 3.99 | 54.16 |
| 6 | 3.81 | 3.90 | 49.27 |
| 7 | 3.67 | 3.85 | 47.12 |
| 8 | 3.59 | 3.84 | 46.75 |
Final Performance: Validation Loss: 3.84 | Perplexity: 46.75 | Training Time: 2.24 hours
Performance
Generation Quality
Perplexity: 46.75 places this model in the "good" quality tier:
- β Generates coherent sentences
- β Maintains basic grammar
- β Produces logical text flow
- β Suitable for prototyping and experimentation
- β οΈ May show repetition in longer sequences
- β οΈ Less sophisticated than larger models
Inference Speed
| Hardware | Tokens/Second (estimate) |
|---|---|
| CPU (Intel i7) | ~80-120 |
| GPU (T4) | ~400-600 |
| GPU (V100) | ~700-1000 |
Memory Requirements
- Model Size on Disk: ~183 MB
- RAM (CPU inference): ~600 MB
- VRAM (GPU inference): ~1.5 GB
Usage Examples
Interactive Text Completion
def generate_completion(prompt, max_tokens=50):
input_ids = tokenizer.encode(prompt, return_tensors="pt")
outputs = model.generate(
input_ids,
max_new_tokens=max_tokens,
temperature=0.7,
do_sample=True
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Try different prompts
prompts = [
"The history of computers began",
"Climate change is affecting",
"In the field of medicine"
]
for prompt in prompts:
print(f"Prompt: {prompt}")
print(f"Output: {generate_completion(prompt)}\n")
Controlling Generation Style
# More creative (higher temperature)
outputs = model.generate(input_ids, temperature=1.0, max_new_tokens=50)
# More focused (lower temperature)
outputs = model.generate(input_ids, temperature=0.5, max_new_tokens=50)
Limitations
Technical Limitations
- Short Context: 256 token limit (vs GPT-2's 1024)
- Model Size: 48M parameters - smaller than production models
- Language: Primarily English (GPT-2 tokenizer)
- Coherence: Long-form generation may lose coherence
- Factuality: May generate plausible but incorrect information
Known Issues
- Tendency to repeat phrases in generations longer than 100 tokens
- May struggle with highly technical or specialized domains
- Occasional grammatical errors in complex sentence structures
Bias and Ethical Considerations
Potential Biases
This model may inherit biases present in the training data, including:
- Historical and cultural biases
- Geographic and demographic representation imbalances
- Potential biases in article quality across topics
Recommended Practices
- β Always verify factual claims from generated text
- β Use human review for public-facing applications
- β Be transparent about AI-generated content
- β Don't use for generating misleading information
- β Don't rely on for safety-critical decisions
- β Don't use for medical, legal, or financial advice
Intended Use
Recommended Applications
- π Educational tools and learning systems
- π¬ Research on adaptive computation in transformers
- π οΈ Prototyping language model applications
- π» Resource-constrained deployment scenarios
- π Experimenting with language models
- βοΈ Text completion and generation experiments
Not Recommended For
- β Production chatbots without human oversight
- β Generating authoritative content without verification
- β Applications requiring high factual accuracy
- β Professional writing assistance
- β Real-time conversational AI
Citation
If you use this model in your work, please cite:
@misc{girinath2025recursive_language_model,
author = {Girinath V},
title = {Recursive Language Model with Adaptive Depth Processing},
year = {2025},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
howpublished = {\url{https://huggingface.co/Girinath11/recursive-language-model-48m}}
}
Acknowledgments
- Framework: PyTorch and Hugging Face Transformers
- Inspiration: Adaptive Computation Time and Mixture of Experts research
- Training: Conducted on Kaggle/Colab GPU resources
License
This model is released under the Apache 2.0 License. You are free to use, modify, and distribute this model for any purpose, including commercial applications, with attribution.
Model Card Authors
Girinath V (@Girinath11)
Contact
For questions, issues, or collaboration:
- π€ Hugging Face: @Girinath11
- π¬ Discussions: Model Discussion Board
Model Version: 1.0
Release Date: January 2025
Status: Stable
Framework: PyTorch 2.0+
Transformers: 4.35+
- Downloads last month
- 362