ayushsinha's picture
Create README.md
0cd3fe7 verified
# Text-to-Text Transfer Transformer Quantized Model for News Summarization
This repository hosts a quantized version of the T5 model, fine-tuned specifically for text summarization of news. The model extracts concise summaries from semi-structured or unstructured news texts, making it ideal for POS systems, kitchen displays, and chat-based food order logging.
## Model Details
- **Field:** Description
- **Model Architecture** T5 (Text-to-Text Transfer Transformer)
- **Task** Text Summarization for News
- **Input Format** Free-form order text (includes Order ID, Customer, Items, etc.)
- **Quantization** 8-bit (int8) using bitsandbytes
- **Framework** Hugging Face Transformers
- **Base Model** t5-base
- **Dataset** Custom
## Usage
## Installation
```sh
pip install transformers accelerate bitsandbytes torch
```
### Loading the Model
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
model_name = "AventIQ-AI/T5-News-Summarization"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name, load_in_8bit=True, device_map="auto")
def test_summarization(model, tokenizer):
user_text = input("\nEnter your News text:\n")
inputs = tokenizer("summarize: " + user_text, return_tensors="pt", truncation=True, max_length=512).to(model.device)
output = model.generate(
**inputs,
max_new_tokens=100,
num_beams=5,
length_penalty=0.8,
early_stopping=True
)
summary = tokenizer.decode(output[0], skip_special_tokens=True)
return summary
print("\nπŸ“ **Model Summary:**")
print(test_summarization(model, tokenizer))
```
## ROUGE Evaluation Results
After fine-tuning the **T5-Small** model for text summarization, we obtained the following **ROUGE** scores:
| **Metric** | **Score** | **Meaning** |
|-------------|-----------|-------------|
| **ROUGE-1** | **0.4125** (~41%) | Overlap of **unigrams** between reference and summary. |
| **ROUGE-2** | **0.2167** (~22%) | Overlap of **bigrams**, indicating fluency. |
| **ROUGE-L** | **0.3421** (~34%) | Longest common subsequence matching structure. |
| **ROUGE-Lsum** | **0.3644** (~36%) | Sentence-level summarization effectiveness. |
## Fine-Tuning Details
### Dataset
Custom-labeled food order dataset containing fields like Order ID, Customer, and Order Details. The model was trained to extract clean, natural summaries from noisy or inconsistent order formats.
### Training
- Number of epochs: 3
- Batch size: 4
- Evaluation strategy: epoch
- Learning rate: 3e-5
### Quantization
Post-training 8-bit quantization using bitsandbytes library with Hugging Face integration. This reduced the model size and improved inference speed with negligible impact on summarization quality.
## Repository Structure
```
.
β”œβ”€β”€ model/ # Contains the quantized model files
β”œβ”€β”€ tokenizer_config/ # Tokenizer configuration and vocabulary files
β”œβ”€β”€ model.safetensors/ # Quantized model weights
β”œβ”€β”€ README.md # Model documentation
```
## Limitations
- The model may misinterpret or misformat input with excessive noise or missing key fields.
- Quantized versions may show slight accuracy loss compared to full-precision models.
- Best suited for English-language food order formats.
## Contributing
Contributions are welcome! If you have suggestions, feature requests, or improvements, feel free to open an issue or submit a pull request.