Recursive Language Model - 48M

A transformer-based language model with adaptive recursive processing mechanism for enhanced text generation.

Model Description

This model implements a Recursive Language Model architecture that uses a router network to dynamically determine the optimal number of refinement passes for each input. This adaptive computation approach allows the model to allocate more processing to complex inputs while being efficient on simpler ones.

Key Innovation: Unlike standard transformers that process all inputs uniformly, this model learns when to "think harder" through additional recursion steps.

Quick Start

Installation

pip install transformers torch

Basic Usage

from transformers import AutoModelForCausalLM, GPT2Tokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "Girinath11/recursive-language-model-48m",
    trust_remote_code=True
)
tokenizer = GPT2Tokenizer.from_pretrained("Girinath11/recursive-language-model-48m")

# Generate text
prompt = "The future of artificial intelligence"
input_ids = tokenizer.encode(prompt, return_tensors="pt")

outputs = model.generate(
    input_ids,
    max_new_tokens=50,
    temperature=0.8,
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Model Details

Architecture

Component	Value
Parameters	47,931,907 (~48M)
Vocabulary	50,257 tokens (GPT-2)
Embedding Dimension	512
Transformer Layers	6 base layers
Attention Heads	8
Max Recursion Steps	2
Context Length	256 tokens
Positional Encoding	Learned embeddings

Architecture Components

Token & Position Embeddings - Input representation layer
Main Transformer Stack - 6 standard transformer encoder layers with causal masking
Recursion Depth Router - Lightweight classifier that predicts optimal recursion depth
Recursive Processing Layer - Reusable transformer layer for refinement
Language Model Head - Projects to vocabulary with weight tying to embeddings

The router network uses soft weighting to blend outputs from different recursion depths, making the model differentiable end-to-end.

Training Details

Dataset

Total Samples: 100,000 text documents
Training Split: 95,000 samples (95%)
Validation Split: 5,000 samples (5%)
Tokenizer: GPT-2 tokenizer (50,257 vocab)

Training Configuration

Hardware:
  GPU: NVIDIA T4 (16 GB)
  Mixed Precision: FP16 (AMP)

Hyperparameters:
  Batch Size: 32
  Gradient Accumulation: 4
  Effective Batch Size: 128
  Learning Rate: 5e-4
  Optimizer: AdamW
  Weight Decay: 0.1
  LR Scheduler: OneCycleLR (cosine)
  Total Epochs: 8
  Sequence Length: 256

Regularization:
  Dropout Rate: 0.1

Training Time

Total Duration: 2.24 hours
Time per Epoch: ~16 minutes
Training Speed: 3.10 iterations/second

Training Progression

Epoch	Training Loss	Validation Loss	Perplexity
1	7.38	6.01	406.28
2	5.50	4.97	143.59
3	4.72	4.43	84.06
4	4.28	4.15	63.62
5	4.01	3.99	54.16
6	3.81	3.90	49.27
7	3.67	3.85	47.12
8	3.59	3.84	46.75

Final Performance: Validation Loss: 3.84 | Perplexity: 46.75 | Training Time: 2.24 hours

Performance

Generation Quality

Perplexity: 46.75 places this model in the "good" quality tier:

✅ Generates coherent sentences
✅ Maintains basic grammar
✅ Produces logical text flow
✅ Suitable for prototyping and experimentation
⚠️ May show repetition in longer sequences
⚠️ Less sophisticated than larger models

Inference Speed

Hardware	Tokens/Second (estimate)
CPU (Intel i7)	~80-120
GPU (T4)	~400-600
GPU (V100)	~700-1000

Memory Requirements

Model Size on Disk: ~183 MB
RAM (CPU inference): ~600 MB
VRAM (GPU inference): ~1.5 GB

Usage Examples

Interactive Text Completion

def generate_completion(prompt, max_tokens=50):
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    outputs = model.generate(
        input_ids,
        max_new_tokens=max_tokens,
        temperature=0.7,
        do_sample=True
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Try different prompts
prompts = [
    "The history of computers began",
    "Climate change is affecting",
    "In the field of medicine"
]

for prompt in prompts:
    print(f"Prompt: {prompt}")
    print(f"Output: {generate_completion(prompt)}\n")

Controlling Generation Style

# More creative (higher temperature)
outputs = model.generate(input_ids, temperature=1.0, max_new_tokens=50)

# More focused (lower temperature)
outputs = model.generate(input_ids, temperature=0.5, max_new_tokens=50)

Limitations

Technical Limitations

Short Context: 256 token limit (vs GPT-2's 1024)
Model Size: 48M parameters - smaller than production models
Language: Primarily English (GPT-2 tokenizer)
Coherence: Long-form generation may lose coherence
Factuality: May generate plausible but incorrect information

Known Issues

Tendency to repeat phrases in generations longer than 100 tokens
May struggle with highly technical or specialized domains
Occasional grammatical errors in complex sentence structures

Bias and Ethical Considerations

Potential Biases

This model may inherit biases present in the training data, including:

Historical and cultural biases
Geographic and demographic representation imbalances
Potential biases in article quality across topics

Recommended Practices

✅ Always verify factual claims from generated text
✅ Use human review for public-facing applications
✅ Be transparent about AI-generated content
❌ Don't use for generating misleading information
❌ Don't rely on for safety-critical decisions
❌ Don't use for medical, legal, or financial advice

Intended Use

Recommended Applications

📚 Educational tools and learning systems
🔬 Research on adaptive computation in transformers
🛠️ Prototyping language model applications
💻 Resource-constrained deployment scenarios
🎓 Experimenting with language models
✍️ Text completion and generation experiments

Not Recommended For

❌ Production chatbots without human oversight
❌ Generating authoritative content without verification
❌ Applications requiring high factual accuracy
❌ Professional writing assistance
❌ Real-time conversational AI

Citation

If you use this model in your work, please cite:

@misc{girinath2025recursive_language_model,
  author = {Girinath V},
  title = {Recursive Language Model with Adaptive Depth Processing},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/Girinath11/recursive-language-model-48m}}
}

Acknowledgments

Framework: PyTorch and Hugging Face Transformers
Inspiration: Adaptive Computation Time and Mixture of Experts research
Training: Conducted on Kaggle/Colab GPU resources

License

This model is released under the Apache 2.0 License. You are free to use, modify, and distribute this model for any purpose, including commercial applications, with attribution.

Model Card Authors

Girinath V (@Girinath11)

Contact

For questions, issues, or collaboration:

🤗 Hugging Face: @Girinath11
💬 Discussions: Model Discussion Board

Model Version: 1.0
Release Date: January 2025
Status: Stable
Framework: PyTorch 2.0+
Transformers: 4.35+

Downloads last month: 362

Safetensors

Model size

47.9M params

Tensor type

F32

Model tree for Girinath11/recursive-language-model-48m

Finetunes

1 model