Kostya165's picture
Update README.md
4ee3f20 verified
---
license: cc-by-sa-4.0
datasets:
- Kostya165/ru_emotion_dvach
language:
- ru
metrics:
- accuracy
base_model:
- cointegrated/rubert-tiny2
pipeline_tag: text-classification
tags:
- russian
- emotion
- sentiment
- sentiment-analisys
- emotion-analisys
- emotion-classification
- emotion-detection
- rubert
- rubert-tiny
---
# rubert_tiny2_russian_emotion_sentiment
## Описание
Модель `rubert_tiny2_russian_emotion_sentiment` — это дообученная версия легковесной модели [`cointegrated/rubert-tiny2`](https://huggingface.co/cointegrated/rubert-tiny2) для классификации пяти эмоций в русскоязычных сообщениях:
- **0**: aggression (агрессия)
- **1**: anxiety (тревожность)
- **2**: neutral (нейтральное состояние)
- **3**: positive (позитив)
- **4**: sarcasm (сарказм)
### Результаты на валидации
| Метрика | Значение |
|------------|----------|
| Accuracy | 0.8911 |
| F1 macro | 0.8910 |
| F1 micro | 0.8911 |
**Точность по классам**:
- агрессия (0): 0.9120
- тревожность (1): 0.9462
- нейтральное (2): 0.8663
- позитив (3): 0.8884
- сарказм (4): 0.8426
### Использование
```bash
pip install transformers torch
```
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Загружаем модель и токенизатор
MODEL_ID = "Kostya165/rubert_tiny2_russian_emotion_sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID)
model.eval()
texts = [
"Сегодня отличный день!",
"Меня это всё бесит и раздражает."
]
# Токенизация
enc = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt")
with torch.no_grad():
logits = model(**enc).logits
preds = logits.argmax(dim=-1).tolist()
# Преобразуем ID обратно в метки
id2label = model.config.id2label
labels = [id2label[p] for p in preds]
print(labels) # например: ['positive', 'aggression']
```
### Как было обучено
- **База**: `cointegrated/rubert-tiny2`
- **Датасет**: `Kostya165/ru_emotion_dvach`
- **Эпохи**: 2
- **Batch size**: 32
- **LR**: 1e-5
- **Mixed precision**: FP16
- **Регуляризация**: Dropout 0.1, weight_decay 0.01, warmup_ratio 0.1
### Зависимости
- `transformers>=4.30.0`
- `torch>=1.10.0`
- `datasets`
- `evaluate`
### Лицензия
CC-BY-SA 4.0.
### Цитирование
```bibtex
@article{rubert_tiny2_russian_emotion_sentiment,
title = {Russian Emotion Sentiment Classification with RuBERT-tiny2},
author = {Kostya165},
year = {2024},
howpublished = {\url{https://huggingface.co/Kostya165/rubert_tiny2_russian_emotion_sentiment}}
}
```
---
## English
# rubert_tiny2_russian_emotion_sentiment
**Description**
The `rubert_tiny2_russian_emotion_sentiment` model is a fine‑tuned version of the lightweight [`cointegrated/rubert-tiny2`](https://huggingface.co/cointegrated/rubert-tiny2) for classifying five emotions in Russian text:
- **0**: aggression
- **1**: anxiety
- **2**: neutral
- **3**: positive
- **4**: sarcasm
**Validation Results**
| Metric | Value |
|------------|--------|
| Accuracy | 0.8911 |
| F1 macro | 0.8910 |
| F1 micro | 0.8911 |
**Per‑class accuracy**:
- aggression: 0.9120
- anxiety: 0.9462
- neutral: 0.8663
- positive: 0.8884
- sarcasm: 0.8426
**Usage**
```bash
pip install transformers torch
```
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
MODEL_ID = "Kostya165/rubert_tiny2_russian_emotion_sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID)
model.eval()
texts = ["Сегодня отличный день!", "Меня это всё бесит и раздражает."]
enc = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt")
with torch.no_grad():
logits = model(**enc).logits
preds = logits.argmax(dim=-1).tolist()
labels = [model.config.id2label[p] for p in preds]
print(labels) # e.g. ['positive', 'aggression']
```
**Training Details**
- Base: `cointegrated/rubert-tiny2`
- Dataset: `Kostya165/ru_emotion_dvach` (train/validation)
- Epochs: 2
- Batch size: 32
- Learning rate: 1e‑5
- Mixed precision: FP16
- Regularization: Dropout 0.1, weight_decay 0.01, warmup_ratio 0.1
**Requirements**
- `transformers>=4.30.0`
- `torch>=1.10.0`
- `datasets`
- `evaluate`
**License**
CC-BY-SA 4.0.
**Citation**
```bibtex
@article{rubert_tiny2_russian_emotion_sentiment,
title = {Russian Emotion Sentiment Classification with RuBERT-tiny2},
author = {Kostya165},
year = {2024},
howpublished = {\url{https://huggingface.co/Kostya165/rubert_tiny2_russian_emotion_sentiment}}
}
```