RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper
•
1907.11692
•
Published
•
9
The research for understanding the bias in criminal court decisions need the support of natural language processing tools.
The pre-trained language model has greatly improved the accuracy of text mining in general texts. At present, there is an urgent need for a pre-trained language model specifically for the automatic processing of court decision texts.
We used the text from the Bailii website as the training set. Based on the deep language model framework of RoBERTa, we constructed bailii-roberta pre-training language model by transformers/run_mlm.py and transformers/mlm_wwm.
The from_pretrained method based on Huggingface Transformers can directly obtain bailii-roberta model online.
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("tsantosh7/bailii-roberta")
model = AutoModel.from_pretrained("tsantosh7/bailii-roberta")
PyTorch.