--- license: cc-by-sa-4.0 datasets: - Kostya165/ru_emotion_dvach language: - ru metrics: - accuracy base_model: - cointegrated/rubert-tiny2 pipeline_tag: text-classification tags: - russian - emotion - sentiment - sentiment-analisys - emotion-analisys - emotion-classification - emotion-detection - rubert - rubert-tiny --- # rubert_tiny2_russian_emotion_sentiment ## Описание Модель `rubert_tiny2_russian_emotion_sentiment` — это дообученная версия легковесной модели [`cointegrated/rubert-tiny2`](https://huggingface.co/cointegrated/rubert-tiny2) для классификации пяти эмоций в русскоязычных сообщениях: - **0**: aggression (агрессия) - **1**: anxiety (тревожность) - **2**: neutral (нейтральное состояние) - **3**: positive (позитив) - **4**: sarcasm (сарказм) ### Результаты на валидации | Метрика | Значение | |------------|----------| | Accuracy | 0.8911 | | F1 macro | 0.8910 | | F1 micro | 0.8911 | **Точность по классам**: - агрессия (0): 0.9120 - тревожность (1): 0.9462 - нейтральное (2): 0.8663 - позитив (3): 0.8884 - сарказм (4): 0.8426 ### Использование ```bash pip install transformers torch ``` ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Загружаем модель и токенизатор MODEL_ID = "Kostya165/rubert_tiny2_russian_emotion_sentiment" tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID) model.eval() texts = [ "Сегодня отличный день!", "Меня это всё бесит и раздражает." ] # Токенизация enc = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt") with torch.no_grad(): logits = model(**enc).logits preds = logits.argmax(dim=-1).tolist() # Преобразуем ID обратно в метки id2label = model.config.id2label labels = [id2label[p] for p in preds] print(labels) # например: ['positive', 'aggression'] ``` ### Как было обучено - **База**: `cointegrated/rubert-tiny2` - **Датасет**: `Kostya165/ru_emotion_dvach` - **Эпохи**: 2 - **Batch size**: 32 - **LR**: 1e-5 - **Mixed precision**: FP16 - **Регуляризация**: Dropout 0.1, weight_decay 0.01, warmup_ratio 0.1 ### Зависимости - `transformers>=4.30.0` - `torch>=1.10.0` - `datasets` - `evaluate` ### Лицензия CC-BY-SA 4.0. ### Цитирование ```bibtex @article{rubert_tiny2_russian_emotion_sentiment, title = {Russian Emotion Sentiment Classification with RuBERT-tiny2}, author = {Kostya165}, year = {2024}, howpublished = {\url{https://huggingface.co/Kostya165/rubert_tiny2_russian_emotion_sentiment}} } ``` --- ## English # rubert_tiny2_russian_emotion_sentiment **Description** The `rubert_tiny2_russian_emotion_sentiment` model is a fine‑tuned version of the lightweight [`cointegrated/rubert-tiny2`](https://huggingface.co/cointegrated/rubert-tiny2) for classifying five emotions in Russian text: - **0**: aggression - **1**: anxiety - **2**: neutral - **3**: positive - **4**: sarcasm **Validation Results** | Metric | Value | |------------|--------| | Accuracy | 0.8911 | | F1 macro | 0.8910 | | F1 micro | 0.8911 | **Per‑class accuracy**: - aggression: 0.9120 - anxiety: 0.9462 - neutral: 0.8663 - positive: 0.8884 - sarcasm: 0.8426 **Usage** ```bash pip install transformers torch ``` ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch MODEL_ID = "Kostya165/rubert_tiny2_russian_emotion_sentiment" tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID) model.eval() texts = ["Сегодня отличный день!", "Меня это всё бесит и раздражает."] enc = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt") with torch.no_grad(): logits = model(**enc).logits preds = logits.argmax(dim=-1).tolist() labels = [model.config.id2label[p] for p in preds] print(labels) # e.g. ['positive', 'aggression'] ``` **Training Details** - Base: `cointegrated/rubert-tiny2` - Dataset: `Kostya165/ru_emotion_dvach` (train/validation) - Epochs: 2 - Batch size: 32 - Learning rate: 1e‑5 - Mixed precision: FP16 - Regularization: Dropout 0.1, weight_decay 0.01, warmup_ratio 0.1 **Requirements** - `transformers>=4.30.0` - `torch>=1.10.0` - `datasets` - `evaluate` **License** CC-BY-SA 4.0. **Citation** ```bibtex @article{rubert_tiny2_russian_emotion_sentiment, title = {Russian Emotion Sentiment Classification with RuBERT-tiny2}, author = {Kostya165}, year = {2024}, howpublished = {\url{https://huggingface.co/Kostya165/rubert_tiny2_russian_emotion_sentiment}} } ```