layoutlmv3-base-finetuned-rvlcdip / README.md

Update README.md

d049bf0 verified over 1 year ago

3.78 kB

	---
	model-index:
	- name: layoutlmv3-base-finetuned-rvlcdip
	results:
	- task:
	type: document-image-classification
	name: document-image-classification
	dataset:
	name: rvl-cdip
	type: amazon-ocr
	metrics:
	- type: evaluation_loss
	value: 0.1856316477060318
	name: Evaluation Loss
	- type: accuracy
	value: 0.9519237980949524
	name: Evaluation Accuracy
	- type: weighted_f1
	value: 0.9518911690649716
	name: Evaluation Weighted F1
	- type: micro_f1
	value: 0.9519237980949524
	name: Evaluation Micro F1
	- type: macro_f1
	value: 0.9518042570370386
	name: Evaluation Macro F1
	- type: weighted_recall
	value: 0.9519237980949524
	name: Evaluation Weighted Recall
	- type: micro_recall
	value: 0.9519237980949524
	name: Evaluation Micro Recall
	- type: macro_recall
	value: 0.9518171728908463
	name: Evaluation Macro Recall
	- type: weighted_precision
	value: 0.9519094862975979
	name: Evaluation Weighted Precision
	- type: micro_precision
	value: 0.9519237980949524
	name: Evaluation Micro Precision
	- type: macro_precision
	value: 0.9518423447239385
	name: Evaluation Macro Precision
	- type: runtime
	value: 514.7031
	name: Evaluation Runtime (seconds)
	- type: samples_per_second
	value: 77.713
	name: Evaluation Samples per Second
	- type: steps_per_second
	value: 1.214
	name: Evaluation Steps per Second

	---

	# layoutlmv3-base-finetuned-rvlcdip

	This model is a fine-tuned version of microsoft/layoutlmv3-base on the [RVL-CDIP dataset](https://adamharley.com/rvl-cdip/) processed using Amazon OCR.
	The following metrics were computed on the evaluation set after the final optimization step:

	* Evaluation Loss: 0.1856316477060318
	* Evaluation Accuracy: 0.9519237980949524
	* Evaluation Weighted F1: 0.9518911690649716
	* Evaluation Micro F1: 0.9519237980949524
	* Evaluation Macro F1: 0.9518042570370386
	* Evaluation Weighted Recall: 0.9519237980949524
	* Evaluation Micro Recall: 0.9519237980949524
	* Evaluation Macro Recall: 0.9518171728908463
	* Evaluation Weighted Precision: 0.9519094862975979
	* Evaluation Micro Precision: 0.9519237980949524
	* Evaluation Macro Precision: 0.9518423447239385
	* Evaluation Runtime (seconds): 514.7031
	* Evaluation Samples per Second: 77.713
	* Evaluation Steps per Second: 1.214

	## Training logs

	See wandb report: https://api.wandb.ai/links/gordon-lim/lokqu7ok

	### Training arguments

	The following arguments were provided to Trainer:
	- Output Directory: ./results
	- Maximum Steps: 20000
	- Per Device Train Batch Size: 32 (due to CUDA memory constraints; paper uses 64, trained using 2 GPUs so 32 * 2 effective batch size)
	- Per Device Evaluation Batch Size: 32 (due to CUDA memory constraints)
	- Warmup Steps: 0 (not specified in paper, but warmup ratio is used for DocVQA, hence assumed default)
	- Weight Decay: 0 (not specified in paper for RVL-CDIP, but 0.05 for PubLayNet, hence assumed default)
	- Evaluation Strategy: steps
	- Evaluation Steps: 1000
	- Evaluate on Start: True
	- Save Strategy: steps
	- Save Steps: 1000
	- Save Total Limit: 5
	- Learning Rate: 2e-5
	- Load Best Model at End: True
	- Metric for Best Model: accuracy
	- Greater is Better: True
	- Report to: wandb (log to Weights & Biases)
	- Logging Steps: 1000
	- Logging First Step: True
	- Learning Rate Scheduler Type: cosine (not mentioned in paper, but PubLayNet GitHub example uses 'cosine')
	- FP16: True (due to CUDA memory constraints)
	- Dataloader Number of Workers: 4 (number of subprocesses to use for data loading)
	- DDP Find Unused Parameters: True

	### Framework versions

	- Transformers 4.42.3
	- Pytorch 2.2.0+cu121
	- Datasets 2.14.0
	- Tokenizers 0.19.1