|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- TTA-DQA/hate_sentence |
|
|
language: |
|
|
- ko |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
base_model: |
|
|
- monologg/koelectra-base-v3-discriminator |
|
|
tags: |
|
|
- Text-Classification |
|
|
- Hate-Detection |
|
|
- Hate-Senetence-Detection |
|
|
--- |
|
|
### π λͺ¨λΈ μμΈ μ 보 |
|
|
## 1. π§Ύ κ°μ |
|
|
|
|
|
μ΄ λͺ¨λΈμ **νκ΅μ΄ λ¬Έμ₯ λ΄ μ ν΄ ννμ μ 무λ₯Ό κ²μΆ**νκΈ° μν΄ νμ΅λ λͺ¨λΈμ
λλ€. |
|
|
`binary classification`μ μννλ©°, μ ν΄ ννμ΄ ν¬ν¨λμκ±°λ μΌλ°μ μΈ λ¬Έμ₯μΈμ§λ₯Ό **νλ¨(λΆλ₯)**ν©λλ€. |
|
|
AI-Taskλ‘λ `text-classification`μ ν΄λΉν©λλ€. |
|
|
μ¬μ©νλ λ°μ΄ν°μ
μ [`TTA-DQA/hate_sentence`](https://huggingface.co/datasets/TTA-DQA/hate_sentence)μ
λλ€. |
|
|
|
|
|
- **ν΄λμ€ κ΅¬μ±**: |
|
|
- `"0"`: `no_hate` |
|
|
- `"1"`: `hate` |
|
|
--- |
|
|
## 2. π§ νμ΅ μ 보 |
|
|
|
|
|
- **Base Model**: KcElectra (a pre-trained Korean language model based on Electra) |
|
|
- **Source**: [monologg/koelectra-base-v3-discriminator](https://huggingface.co/monologg/koelectra-base-v3-discriminator) |
|
|
- **Model Type**: Casual Language Model |
|
|
- **Pre-training (Korean)**: μ½ 20GB |
|
|
- **Fine-tuning (Hate Dataset)**: μ½ 22.3MB (`TTA-DQA/hate_sentence`) |
|
|
- **Learning Rate**: `5e-6` |
|
|
- **Weight Decay**: `0.01` |
|
|
- **Epochs**: `20` |
|
|
- **Batch Size**: `16` |
|
|
- **Data Loader Workers**: `2` |
|
|
- **Tokenizer**: `BertWordPieceTokenizer` |
|
|
- **Model Size**: μ½ `512MB` |
|
|
|
|
|
--- |
|
|
|
|
|
## 3. π§© μꡬμ¬ν |
|
|
|
|
|
- `pytorch ~= 1.8.0` |
|
|
- `transformers ~= 4.0.0` |
|
|
|
|
|
--- |
|
|
|
|
|
## 4. π Quick Start |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline |
|
|
|
|
|
model_name = "TTA-DQA/HateDetection_KoElectra_FineTuning" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer) |
|
|
|
|
|
sentences = ["μ€λ μ μ¬ λ λ¨ΉμκΉ?", "μ΄ λμ λμ."] |
|
|
results = classifier(sentences)' |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## 5.π Citation |
|
|
μ΄ λͺ¨λΈμ μ΄κ±°λAI νμ΅μ© λ°μ΄ν° νμ§κ²μ¦ μ¬μ
(2024λ
λ μ΄κ±°λAI νμ΅μ© νμ§κ²μ¦)μ μν΄μ ꡬμΆλμμ΅λλ€. |
|
|
|
|
|
--- |
|
|
|
|
|
## 6. β οΈ Bias, Risks, and Limitations |
|
|
|
|
|
λ³Έ λͺ¨λΈμ κ° ν΄λμ€μ λ°μ΄ν°λ₯Ό νΈν₯λκ² νμ΅νμ§λ μμμΌλ, |
|
|
μΈμ΄μ Β·λ¬Ένμ νΉμ±μ μν΄ λ μ΄λΈμ λν μ΄κ²¬μ΄ μμ μ μμ΅λλ€. |
|
|
μ ν΄ ννμ μΈμ΄, λ¬Έν, μ μ© λΆμΌ, κ°μΈμ 견ν΄μ λ°λΌ μ£Όκ΄μ μΈ λΆλΆμ΄ μ‘΄μ¬νμ¬, |
|
|
κ²°κ³Όμ λν νΈν₯ λλ λ
Όλμ΄ λ°μν μ μμ΅λλ€. |
|
|
|
|
|
> β λ³Έ λͺ¨λΈμ κ²°κ³Όλ μ λμ μΈ μ ν΄ νν κΈ°μ€μ΄ μλμ μ μν΄ μ£ΌμΈμ. |
|
|
|
|
|
--- |
|
|
|
|
|
# π Results |
|
|
- Task: binary classification (text-classification) |
|
|
- F1-score: 0.9881 |
|
|
- Accuracy: 0.9881 |