Recursive Language Model - 48M

A transformer-based language model with adaptive recursive processing mechanism for enhanced text generation.

Model Description

This model implements a Recursive Language Model architecture that uses a router network to dynamically determine the optimal number of refinement passes for each input. This adaptive computation approach allows the model to allocate more processing to complex inputs while being efficient on simpler ones.

Key Innovation: Unlike standard transformers that process all inputs uniformly, this model learns when to "think harder" through additional recursion steps.

Quick Start

Installation

pip install transformers torch

Basic Usage

from transformers import AutoModelForCausalLM, GPT2Tokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "Girinath11/recursive-language-model-48m",
    trust_remote_code=True
)
tokenizer = GPT2Tokenizer.from_pretrained("Girinath11/recursive-language-model-48m")

# Generate text
prompt = "The future of artificial intelligence"
input_ids = tokenizer.encode(prompt, return_tensors="pt")

outputs = model.generate(
    input_ids,
    max_new_tokens=50,
    temperature=0.8,
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Model Details

Architecture

Component Value
Parameters 47,931,907 (~48M)
Vocabulary 50,257 tokens (GPT-2)
Embedding Dimension 512
Transformer Layers 6 base layers
Attention Heads 8
Max Recursion Steps 2
Context Length 256 tokens
Positional Encoding Learned embeddings

Architecture Components

  1. Token & Position Embeddings - Input representation layer
  2. Main Transformer Stack - 6 standard transformer encoder layers with causal masking
  3. Recursion Depth Router - Lightweight classifier that predicts optimal recursion depth
  4. Recursive Processing Layer - Reusable transformer layer for refinement
  5. Language Model Head - Projects to vocabulary with weight tying to embeddings

The router network uses soft weighting to blend outputs from different recursion depths, making the model differentiable end-to-end.

Training Details

Dataset

  • Total Samples: 100,000 text documents
  • Training Split: 95,000 samples (95%)
  • Validation Split: 5,000 samples (5%)
  • Tokenizer: GPT-2 tokenizer (50,257 vocab)

Training Configuration

Hardware:
  GPU: NVIDIA T4 (16 GB)
  Mixed Precision: FP16 (AMP)

Hyperparameters:
  Batch Size: 32
  Gradient Accumulation: 4
  Effective Batch Size: 128
  Learning Rate: 5e-4
  Optimizer: AdamW
  Weight Decay: 0.1
  LR Scheduler: OneCycleLR (cosine)
  Total Epochs: 8
  Sequence Length: 256

Regularization:
  Dropout Rate: 0.1

Training Time

  • Total Duration: 2.24 hours
  • Time per Epoch: ~16 minutes
  • Training Speed: 3.10 iterations/second

Training Progression

Epoch Training Loss Validation Loss Perplexity
1 7.38 6.01 406.28
2 5.50 4.97 143.59
3 4.72 4.43 84.06
4 4.28 4.15 63.62
5 4.01 3.99 54.16
6 3.81 3.90 49.27
7 3.67 3.85 47.12
8 3.59 3.84 46.75

Final Performance: Validation Loss: 3.84 | Perplexity: 46.75 | Training Time: 2.24 hours

Performance

Generation Quality

Perplexity: 46.75 places this model in the "good" quality tier:

  • βœ… Generates coherent sentences
  • βœ… Maintains basic grammar
  • βœ… Produces logical text flow
  • βœ… Suitable for prototyping and experimentation
  • ⚠️ May show repetition in longer sequences
  • ⚠️ Less sophisticated than larger models

Inference Speed

Hardware Tokens/Second (estimate)
CPU (Intel i7) ~80-120
GPU (T4) ~400-600
GPU (V100) ~700-1000

Memory Requirements

  • Model Size on Disk: ~183 MB
  • RAM (CPU inference): ~600 MB
  • VRAM (GPU inference): ~1.5 GB

Usage Examples

Interactive Text Completion

def generate_completion(prompt, max_tokens=50):
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    outputs = model.generate(
        input_ids,
        max_new_tokens=max_tokens,
        temperature=0.7,
        do_sample=True
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Try different prompts
prompts = [
    "The history of computers began",
    "Climate change is affecting",
    "In the field of medicine"
]

for prompt in prompts:
    print(f"Prompt: {prompt}")
    print(f"Output: {generate_completion(prompt)}\n")

Controlling Generation Style

# More creative (higher temperature)
outputs = model.generate(input_ids, temperature=1.0, max_new_tokens=50)

# More focused (lower temperature)
outputs = model.generate(input_ids, temperature=0.5, max_new_tokens=50)

Limitations

Technical Limitations

  1. Short Context: 256 token limit (vs GPT-2's 1024)
  2. Model Size: 48M parameters - smaller than production models
  3. Language: Primarily English (GPT-2 tokenizer)
  4. Coherence: Long-form generation may lose coherence
  5. Factuality: May generate plausible but incorrect information

Known Issues

  • Tendency to repeat phrases in generations longer than 100 tokens
  • May struggle with highly technical or specialized domains
  • Occasional grammatical errors in complex sentence structures

Bias and Ethical Considerations

Potential Biases

This model may inherit biases present in the training data, including:

  • Historical and cultural biases
  • Geographic and demographic representation imbalances
  • Potential biases in article quality across topics

Recommended Practices

  • βœ… Always verify factual claims from generated text
  • βœ… Use human review for public-facing applications
  • βœ… Be transparent about AI-generated content
  • ❌ Don't use for generating misleading information
  • ❌ Don't rely on for safety-critical decisions
  • ❌ Don't use for medical, legal, or financial advice

Intended Use

Recommended Applications

  • πŸ“š Educational tools and learning systems
  • πŸ”¬ Research on adaptive computation in transformers
  • πŸ› οΈ Prototyping language model applications
  • πŸ’» Resource-constrained deployment scenarios
  • πŸŽ“ Experimenting with language models
  • ✍️ Text completion and generation experiments

Not Recommended For

  • ❌ Production chatbots without human oversight
  • ❌ Generating authoritative content without verification
  • ❌ Applications requiring high factual accuracy
  • ❌ Professional writing assistance
  • ❌ Real-time conversational AI

Citation

If you use this model in your work, please cite:

@misc{girinath2025recursive_language_model,
  author = {Girinath V},
  title = {Recursive Language Model with Adaptive Depth Processing},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/Girinath11/recursive-language-model-48m}}
}

Acknowledgments

  • Framework: PyTorch and Hugging Face Transformers
  • Inspiration: Adaptive Computation Time and Mixture of Experts research
  • Training: Conducted on Kaggle/Colab GPU resources

License

This model is released under the Apache 2.0 License. You are free to use, modify, and distribute this model for any purpose, including commercial applications, with attribution.

Model Card Authors

Girinath V (@Girinath11)

Contact

For questions, issues, or collaboration:


Model Version: 1.0
Release Date: January 2025
Status: Stable
Framework: PyTorch 2.0+
Transformers: 4.35+

Downloads last month
362
Safetensors
Model size
47.9M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Girinath11/recursive-language-model-48m

Finetunes
1 model