gordonlim's picture
Update README.md
d049bf0 verified
---
model-index:
- name: layoutlmv3-base-finetuned-rvlcdip
results:
- task:
type: document-image-classification
name: document-image-classification
dataset:
name: rvl-cdip
type: amazon-ocr
metrics:
- type: evaluation_loss
value: 0.1856316477060318
name: Evaluation Loss
- type: accuracy
value: 0.9519237980949524
name: Evaluation Accuracy
- type: weighted_f1
value: 0.9518911690649716
name: Evaluation Weighted F1
- type: micro_f1
value: 0.9519237980949524
name: Evaluation Micro F1
- type: macro_f1
value: 0.9518042570370386
name: Evaluation Macro F1
- type: weighted_recall
value: 0.9519237980949524
name: Evaluation Weighted Recall
- type: micro_recall
value: 0.9519237980949524
name: Evaluation Micro Recall
- type: macro_recall
value: 0.9518171728908463
name: Evaluation Macro Recall
- type: weighted_precision
value: 0.9519094862975979
name: Evaluation Weighted Precision
- type: micro_precision
value: 0.9519237980949524
name: Evaluation Micro Precision
- type: macro_precision
value: 0.9518423447239385
name: Evaluation Macro Precision
- type: runtime
value: 514.7031
name: Evaluation Runtime (seconds)
- type: samples_per_second
value: 77.713
name: Evaluation Samples per Second
- type: steps_per_second
value: 1.214
name: Evaluation Steps per Second
---
# layoutlmv3-base-finetuned-rvlcdip
This model is a fine-tuned version of microsoft/layoutlmv3-base on the [RVL-CDIP dataset](https://adamharley.com/rvl-cdip/) processed using Amazon OCR.
The following metrics were computed on the evaluation set after the final optimization step:
* Evaluation Loss: 0.1856316477060318
* Evaluation Accuracy: 0.9519237980949524
* Evaluation Weighted F1: 0.9518911690649716
* Evaluation Micro F1: 0.9519237980949524
* Evaluation Macro F1: 0.9518042570370386
* Evaluation Weighted Recall: 0.9519237980949524
* Evaluation Micro Recall: 0.9519237980949524
* Evaluation Macro Recall: 0.9518171728908463
* Evaluation Weighted Precision: 0.9519094862975979
* Evaluation Micro Precision: 0.9519237980949524
* Evaluation Macro Precision: 0.9518423447239385
* Evaluation Runtime (seconds): 514.7031
* Evaluation Samples per Second: 77.713
* Evaluation Steps per Second: 1.214
## Training logs
See wandb report: https://api.wandb.ai/links/gordon-lim/lokqu7ok
### Training arguments
The following arguments were provided to Trainer:
- Output Directory: ./results
- Maximum Steps: 20000
- Per Device Train Batch Size: 32 (due to CUDA memory constraints; paper uses 64, trained using 2 GPUs so 32 * 2 effective batch size)
- Per Device Evaluation Batch Size: 32 (due to CUDA memory constraints)
- Warmup Steps: 0 (not specified in paper, but warmup ratio is used for DocVQA, hence assumed default)
- Weight Decay: 0 (not specified in paper for RVL-CDIP, but 0.05 for PubLayNet, hence assumed default)
- Evaluation Strategy: steps
- Evaluation Steps: 1000
- Evaluate on Start: True
- Save Strategy: steps
- Save Steps: 1000
- Save Total Limit: 5
- Learning Rate: 2e-5
- Load Best Model at End: True
- Metric for Best Model: accuracy
- Greater is Better: True
- Report to: wandb (log to Weights & Biases)
- Logging Steps: 1000
- Logging First Step: True
- Learning Rate Scheduler Type: cosine (not mentioned in paper, but PubLayNet GitHub example uses 'cosine')
- FP16: True (due to CUDA memory constraints)
- Dataloader Number of Workers: 4 (number of subprocesses to use for data loading)
- DDP Find Unused Parameters: True
### Framework versions
- Transformers 4.42.3
- Pytorch 2.2.0+cu121
- Datasets 2.14.0
- Tokenizers 0.19.1