---
library_name: transformers
datasets:
- ds4sd/DocLayNet-v1.2
base_model:
- microsoft/layoutlmv3-base
---

# Model Card for kbsooo/layoutlmv3_finetuned_doclaynet

## Model Details

### Model Description

This model is a fine-tuned version of [LayoutLMv3](https://huggingface.co/microsoft/layoutlmv3-base) for token classification on the DocLayNet dataset.  
It is designed to classify each token in a document image based on both textual and layout information.

- **Developed by:** kbsooo
- **Model type:** LayoutLMv3ForTokenClassification
- **Language(s) (NLP):** Korean (document-oriented)
- **License:** Check DocLayNet and LayoutLMv3 licenses
- **Finetuned from model:** microsoft/layoutlmv3-base

### Model Sources

- **Repository:** [Hugging Face Model Hub](https://huggingface.co/kbsooo/layoutlmv3_finetuned_doclaynet)
- **Paper (optional):** [LayoutLMv3 Paper](https://arxiv.org/abs/2112.01041)

## Uses

### Direct Use

This model can be used for:

- Token classification in document images (e.g., identifying headings, paragraphs, tables, images, lists)
- Document understanding tasks where layout + text information is important

### Downstream Use

- Can be integrated into pipelines for document information extraction
- Useful for document analysis applications: invoice parsing, form processing, etc.

### Out-of-Scope Use

- Not intended for languages or layouts not represented in the DocLayNet dataset
- Not suitable for free-form text without document structure

## Bias, Risks, and Limitations

- The model may misclassify tokens if the document layout or language differs from the training data
- Biases may exist due to dataset composition (DocLayNet)
- Limited to 10 classes of document layout elements

### Recommendations

- Users should preprocess documents similarly to the training setup (tokenization + bounding boxes + image)
- Verify predictions, especially in production or high-stakes scenarios

## How to Get Started with the Model

```python
from transformers import LayoutLMv3ForTokenClassification, AutoProcessor
import torch

repo = "kbsooo/layoutlmv3_finetuned_doclaynet"
model = LayoutLMv3ForTokenClassification.from_pretrained(repo)
processor = AutoProcessor.from_pretrained(repo)

image = ...  # PIL.Image or np.array
text = "Sample document text"

encoding = processor(image, text, return_tensors="pt")
outputs = model(**encoding)
preds = torch.argmax(outputs.logits, dim=-1)
print(preds)
```

## Training Details

### Training Data

- Dataset: DocLayNet-v1.2
- Train/Validation split: 200/100 samples
- Columns: input_ids, attention_mask, bbox, labels, pixel_values, n_words_in, n_words_out

### Training Procedure

- Optimizer: AdamW
- Learning rate: 5e-5
- Epochs: 5
- Mixed precision: FP16 optional
- Loss: Cross-entropy per token


## Evaluation

- Sample metrics (from validation set):
  - Avg Train Loss: 0.134
  - Avg Val Loss: 0.458
- Token prediction accuracy should be checked against the DocLayNet labels


## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** NVIDIA A100
- **Hours used:** ~1 hr for 5 epochs (for small dataset)

## Technical Specifications

### Model Architecture and Objective

- Base model: LayoutLMv3
- Task: Token classification for document layout elements
- Input: Tokenized text, bounding boxes, and document images
- Output: Token-wise logits for 10 classes

### Compute Infrastructure

- Training performed on Google Colab Pro (A100 GPU)
- Framework: PyTorch + Hugging Face Transformers

## Citation

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

```bibtex
@article{huang2022layoutlmv3,
  title={LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking},
  author={Huang, Zejiang and et al.},
  journal={arXiv preprint arXiv:2112.01041},
  year={2022}
}
```

**APA:**

Huang, Z., et al. (2022). LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking. arXiv preprint arXiv:2112.01041.