TTA-DQA's picture
Update README.md
b8ffd48 verified
---
license: mit
datasets:
- TTA-DQA/hate_sentence
language:
- ko
metrics:
- accuracy
- f1
base_model:
- monologg/koelectra-base-v3-discriminator
tags:
- Text-Classification
- Hate-Detection
- Hate-Senetence-Detection
---
### πŸ“Œ λͺ¨λΈ 상세 정보
## 1. 🧾 κ°œμš”
이 λͺ¨λΈμ€ **ν•œκ΅­μ–΄ λ¬Έμž₯ λ‚΄ μœ ν•΄ ν‘œν˜„μ˜ 유무λ₯Ό κ²€μΆœ**ν•˜κΈ° μœ„ν•΄ ν•™μŠ΅λœ λͺ¨λΈμž…λ‹ˆλ‹€.
`binary classification`을 μˆ˜ν–‰ν•˜λ©°, μœ ν•΄ ν‘œν˜„μ΄ ν¬ν•¨λ˜μ—ˆκ±°λ‚˜ 일반적인 λ¬Έμž₯인지λ₯Ό **νŒλ‹¨(λΆ„λ₯˜)**ν•©λ‹ˆλ‹€.
AI-Taskλ‘œλŠ” `text-classification`에 ν•΄λ‹Ήν•©λ‹ˆλ‹€.
μ‚¬μš©ν•˜λŠ” 데이터셋은 [`TTA-DQA/hate_sentence`](https://huggingface.co/datasets/TTA-DQA/hate_sentence)μž…λ‹ˆλ‹€.
- **클래슀 ꡬ성**:
- `"0"`: `no_hate`
- `"1"`: `hate`
---
## 2. 🧠 ν•™μŠ΅ 정보
- **Base Model**: KcElectra (a pre-trained Korean language model based on Electra)
- **Source**: [monologg/koelectra-base-v3-discriminator](https://huggingface.co/monologg/koelectra-base-v3-discriminator)
- **Model Type**: Casual Language Model
- **Pre-training (Korean)**: μ•½ 20GB
- **Fine-tuning (Hate Dataset)**: μ•½ 22.3MB (`TTA-DQA/hate_sentence`)
- **Learning Rate**: `5e-6`
- **Weight Decay**: `0.01`
- **Epochs**: `20`
- **Batch Size**: `16`
- **Data Loader Workers**: `2`
- **Tokenizer**: `BertWordPieceTokenizer`
- **Model Size**: μ•½ `512MB`
---
## 3. 🧩 μš”κ΅¬μ‚¬ν•­
- `pytorch ~= 1.8.0`
- `transformers ~= 4.0.0`
---
## 4. πŸš€ Quick Start
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
model_name = "TTA-DQA/HateDetection_KoElectra_FineTuning"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
sentences = ["였늘 점심 뭐 λ¨Ήμ„κΉŒ?", "이 λ‚˜μœ λ†ˆμ•„."]
results = classifier(sentences)'
```
---
## 5.πŸ“š Citation
이 λͺ¨λΈμ€ μ΄ˆκ±°λŒ€AI ν•™μŠ΅μš© 데이터 ν’ˆμ§ˆκ²€μ¦ 사업(2024년도 μ΄ˆκ±°λŒ€AI ν•™μŠ΅μš© ν’ˆμ§ˆκ²€μ¦)에 μ˜ν•΄μ„œ κ΅¬μΆ•λ˜μ—ˆμŠ΅λ‹ˆλ‹€.
---
## 6. ⚠️ Bias, Risks, and Limitations
λ³Έ λͺ¨λΈμ€ 각 클래슀의 데이터λ₯Ό 편ν–₯되게 ν•™μŠ΅ν•˜μ§€λŠ” μ•Šμ•˜μœΌλ‚˜,
언어적·문화적 νŠΉμ„±μ— μ˜ν•΄ λ ˆμ΄λΈ”μ— λŒ€ν•œ 이견이 μžˆμ„ 수 μžˆμŠ΅λ‹ˆλ‹€.
μœ ν•΄ ν‘œν˜„μ€ μ–Έμ–΄, λ¬Έν™”, 적용 λΆ„μ•Ό, 개인적 견해에 따라 주관적인 뢀뢄이 μ‘΄μž¬ν•˜μ—¬,
결과에 λŒ€ν•œ 편ν–₯ λ˜λŠ” λ…Όλž€μ΄ λ°œμƒν•  수 μžˆμŠ΅λ‹ˆλ‹€.
> ❗ λ³Έ λͺ¨λΈμ˜ κ²°κ³ΌλŠ” μ ˆλŒ€μ μΈ μœ ν•΄ ν‘œν˜„ 기쀀이 μ•„λ‹˜μ„ μœ μ˜ν•΄ μ£Όμ„Έμš”.
---
# πŸ“ˆ Results
- Task: binary classification (text-classification)
- F1-score: 0.9881
- Accuracy: 0.9881