Update README.md

4ee3f20 verified 8 months ago

5.25 kB

	---
	license: cc-by-sa-4.0
	datasets:
	- Kostya165/ru_emotion_dvach
	language:
	- ru
	metrics:
	- accuracy
	base_model:
	- cointegrated/rubert-tiny2
	pipeline_tag: text-classification
	tags:
	- russian
	- emotion
	- sentiment
	- sentiment-analisys
	- emotion-analisys
	- emotion-classification
	- emotion-detection
	- rubert
	- rubert-tiny
	---
	# rubert_tiny2_russian_emotion_sentiment

	## Описание

	Модель `rubert_tiny2_russian_emotion_sentiment` — это дообученная версия легковесной модели [`cointegrated/rubert-tiny2`](https://huggingface.co/cointegrated/rubert-tiny2) для классификации пяти эмоций в русскоязычных сообщениях:

	- 0: aggression (агрессия)
	- 1: anxiety (тревожность)
	- 2: neutral (нейтральное состояние)
	- 3: positive (позитив)
	- 4: sarcasm (сарказм)


	### Результаты на валидации

	\| Метрика \| Значение \|
	\|------------\|----------\|
	\| Accuracy \| 0.8911 \|
	\| F1 macro \| 0.8910 \|
	\| F1 micro \| 0.8911 \|

	Точность по классам:

	- агрессия (0): 0.9120
	- тревожность (1): 0.9462
	- нейтральное (2): 0.8663
	- позитив (3): 0.8884
	- сарказм (4): 0.8426

	### Использование

	```bash
	pip install transformers torch
	```

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# Загружаем модель и токенизатор
	MODEL_ID = "Kostya165/rubert_tiny2_russian_emotion_sentiment"
	tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
	model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID)
	model.eval()

	texts = [
	"Сегодня отличный день!",
	"Меня это всё бесит и раздражает."
	]

	# Токенизация
	enc = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt")
	with torch.no_grad():
	logits = model(**enc).logits
	preds = logits.argmax(dim=-1).tolist()

	# Преобразуем ID обратно в метки
	id2label = model.config.id2label
	labels = [id2label[p] for p in preds]
	print(labels) # например: ['positive', 'aggression']
	```

	### Как было обучено

	- База: `cointegrated/rubert-tiny2`
	- Датасет: `Kostya165/ru_emotion_dvach`
	- Эпохи: 2
	- Batch size: 32
	- LR: 1e-5
	- Mixed precision: FP16
	- Регуляризация: Dropout 0.1, weight_decay 0.01, warmup_ratio 0.1

	### Зависимости

	- `transformers>=4.30.0`
	- `torch>=1.10.0`
	- `datasets`
	- `evaluate`

	### Лицензия

	CC-BY-SA 4.0.

	### Цитирование

	```bibtex
	@article{rubert_tiny2_russian_emotion_sentiment,
	title = {Russian Emotion Sentiment Classification with RuBERT-tiny2},
	author = {Kostya165},
	year = {2024},
	howpublished = {\url{https://huggingface.co/Kostya165/rubert_tiny2_russian_emotion_sentiment}}
	}
	```

	---

	## English

	# rubert_tiny2_russian_emotion_sentiment

	Description

	The `rubert_tiny2_russian_emotion_sentiment` model is a fine‑tuned version of the lightweight [`cointegrated/rubert-tiny2`](https://huggingface.co/cointegrated/rubert-tiny2) for classifying five emotions in Russian text:

	- 0: aggression
	- 1: anxiety
	- 2: neutral
	- 3: positive
	- 4: sarcasm


	Validation Results

	\| Metric \| Value \|
	\|------------\|--------\|
	\| Accuracy \| 0.8911 \|
	\| F1 macro \| 0.8910 \|
	\| F1 micro \| 0.8911 \|

	Per‑class accuracy:

	- aggression: 0.9120
	- anxiety: 0.9462
	- neutral: 0.8663
	- positive: 0.8884
	- sarcasm: 0.8426

	Usage

	```bash
	pip install transformers torch
	```

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	MODEL_ID = "Kostya165/rubert_tiny2_russian_emotion_sentiment"
	tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
	model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID)
	model.eval()

	texts = ["Сегодня отличный день!", "Меня это всё бесит и раздражает."]
	enc = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt")
	with torch.no_grad():
	logits = model(**enc).logits
	preds = logits.argmax(dim=-1).tolist()

	labels = [model.config.id2label[p] for p in preds]
	print(labels) # e.g. ['positive', 'aggression']
	```

	Training Details

	- Base: `cointegrated/rubert-tiny2`
	- Dataset: `Kostya165/ru_emotion_dvach` (train/validation)
	- Epochs: 2
	- Batch size: 32
	- Learning rate: 1e‑5
	- Mixed precision: FP16
	- Regularization: Dropout 0.1, weight_decay 0.01, warmup_ratio 0.1

	Requirements

	- `transformers>=4.30.0`
	- `torch>=1.10.0`
	- `datasets`
	- `evaluate`

	License

	CC-BY-SA 4.0.

	Citation

	```bibtex
	@article{rubert_tiny2_russian_emotion_sentiment,
	title = {Russian Emotion Sentiment Classification with RuBERT-tiny2},
	author = {Kostya165},
	year = {2024},
	howpublished = {\url{https://huggingface.co/Kostya165/rubert_tiny2_russian_emotion_sentiment}}
	}
	```