kaixkhazaki
/

multilingual-e5-doclaynet

Text Classification

document-classification

Model card Files Files and versions

kaixkhazaki commited on Jan 6, 2025

Commit

130cb5b

·

verified ·

1 Parent(s): 68e3312

Create README.md

Files changed (1) hide show

README.md +84 -0

README.md ADDED Viewed

	@@ -0,0 +1,84 @@

+---
+language: multilingual
+tags:
+- document-classification
+- text-classification
+- multilingual
+- doclaynet
+- e5
+pipeline_tag: text-classification
+license: mit
+base_model: intfloat/multilingual-e5-large
+datasets:
+- pierreguillou/DocLayNet-base
+metrics:
+- accuracy
+model-index:
+- name: multilingual-e5-doclaynet
+  results:
+  - task:
+      type: text-classification
+      name: Document Classification
+    dataset:
+      name: DocLayNet
+      type: pierreguillou/DocLayNet-base
+    metrics:
+      - type: accuracy
+        value: 0.9719
+        name: Test Accuracy
+      - type: f1
+        value: 0.9720
+        name: Weighted F1 Score
+      - type: precision
+        value: 0.9732
+        name: Weighted Precision
+      - type: recall
+        value: 0.9719
+        name: Weighted Recall
+      - type: loss
+        value: 0.5192
+        name: Test Loss
+inference: false
+---
+# Multilingual E5 for Document Classification (DocLayNet)
+This model is a fine-tuned version of intfloat/multilingual-e5-large for document text classification based on the DocLayNet dataset.
+## Model description
+- Base model: intfloat/multilingual-e5-large
+- Task: Document text classification
+- Languages: Multilingual
+- License: MIT
+## Training data
+- Dataset: DocLayNet-base
+- Source: https://huggingface.co/datasets/pierreguillou/DocLayNet-base
+- Categories:
+```python
+{
+    'financial_reports': 0,
+    'government_tenders': 1,
+    'laws_and_regulations': 2,
+    'manuals': 3,
+    'patents': 4,
+    'scientific_articles': 5
+}
+## Training procedure
+Trained on single gpu for 2 epochs for apx. 20 minutes.
+hyperparameters:
+{
+    'batch_size': 8,
+    'num_epochs': 10,
+    'learning_rate': 2e-5,
+    'weight_decay': 0.01,
+    'warmup_ratio': 0.1,
+    'gradient_clip': 1.0,
+    'label_smoothing': 0.1,
+    'optimizer': 'AdamW',
+    'scheduler': 'cosine_with_warmup'
+}
+## Evaluation results
+Test Loss:  0.5192, Test Acc: 0.9719