kaixkhazaki commited on
Commit
130cb5b
·
verified ·
1 Parent(s): 68e3312

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: multilingual
3
+ tags:
4
+ - document-classification
5
+ - text-classification
6
+ - multilingual
7
+ - doclaynet
8
+ - e5
9
+ pipeline_tag: text-classification
10
+ license: mit
11
+ base_model: intfloat/multilingual-e5-large
12
+ datasets:
13
+ - pierreguillou/DocLayNet-base
14
+ metrics:
15
+ - accuracy
16
+ model-index:
17
+ - name: multilingual-e5-doclaynet
18
+ results:
19
+ - task:
20
+ type: text-classification
21
+ name: Document Classification
22
+ dataset:
23
+ name: DocLayNet
24
+ type: pierreguillou/DocLayNet-base
25
+ metrics:
26
+ - type: accuracy
27
+ value: 0.9719
28
+ name: Test Accuracy
29
+ - type: f1
30
+ value: 0.9720
31
+ name: Weighted F1 Score
32
+ - type: precision
33
+ value: 0.9732
34
+ name: Weighted Precision
35
+ - type: recall
36
+ value: 0.9719
37
+ name: Weighted Recall
38
+ - type: loss
39
+ value: 0.5192
40
+ name: Test Loss
41
+ inference: false
42
+ ---
43
+ # Multilingual E5 for Document Classification (DocLayNet)
44
+ This model is a fine-tuned version of intfloat/multilingual-e5-large for document text classification based on the DocLayNet dataset.
45
+
46
+ ## Model description
47
+ - Base model: intfloat/multilingual-e5-large
48
+ - Task: Document text classification
49
+ - Languages: Multilingual
50
+ - License: MIT
51
+
52
+ ## Training data
53
+ - Dataset: DocLayNet-base
54
+ - Source: https://huggingface.co/datasets/pierreguillou/DocLayNet-base
55
+ - Categories:
56
+ ```python
57
+ {
58
+ 'financial_reports': 0,
59
+ 'government_tenders': 1,
60
+ 'laws_and_regulations': 2,
61
+ 'manuals': 3,
62
+ 'patents': 4,
63
+ 'scientific_articles': 5
64
+ }
65
+
66
+ ## Training procedure
67
+
68
+ Trained on single gpu for 2 epochs for apx. 20 minutes.
69
+
70
+ hyperparameters:
71
+ {
72
+ 'batch_size': 8,
73
+ 'num_epochs': 10,
74
+ 'learning_rate': 2e-5,
75
+ 'weight_decay': 0.01,
76
+ 'warmup_ratio': 0.1,
77
+ 'gradient_clip': 1.0,
78
+ 'label_smoothing': 0.1,
79
+ 'optimizer': 'AdamW',
80
+ 'scheduler': 'cosine_with_warmup'
81
+ }
82
+
83
+ ## Evaluation results
84
+ Test Loss: 0.5192, Test Acc: 0.9719