File size: 2,358 Bytes
18fde75
 
 
 
 
0a851d1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
datasets:
- YYama0/CT-RATE-JPN
base_model:
- alabnii/jmedroberta-base-manbyo-wordpiece
---

# jmedroberta-base-manbyo-wordpiece

**jmedroberta-base-manbyo-wordpiece** is a Japanese RoBERTa-based model optimized for medical text understanding. It has been fine-tuned on **CT-RATE-JPN**, a large-scale dataset of Japanese chest CT reports, for **multi-label classification** of 18 common thoracic CT findings.

The model leverages the medical-domain vocabulary coverage of JMedRoBERTa and achieves strong and stable performance on Japanese radiology reports.

---

## Model Overview

* **Base model:** `alabnii/jmedroberta-base-manbyo-wordpiece`
* **Task:** Multi-label classification (18 abnormal findings)
* **Training data:** CT-RATE-JPN (Japanese translations of CT-RATE reports)
* **Input:** Japanese radiology reports
* **Output:** Probabilities (0–1) for each finding

---

## Usage

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model_name = "alabnii/jmedroberta-base-manbyo-wordpiece"
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=18,
    problem_type="multi_label_classification"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

def infer(texts):
    inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
    with torch.no_grad():
        logits = model(**inputs).logits
    return torch.sigmoid(logits)

texts = ["両肺に淡い浸潤影を認めます。"]
probs = infer(texts)
```

---

## License

* Trained on **CT-RATE-JPN**, released under **CC BY-NC-SA**
* Model weights and outputs are for **non-commercial research use only**

---

## Citation

Please cite the following when using this model or the dataset:

```
@misc{yamagishi2024ctrep,
  title={Development of a Large-scale Dataset of Chest Computed Tomography Reports in Japanese and a High-performance Finding Classification Model},
  author={Yosuke Yamagishi et al.},
  year={2024},
  eprint={2412.15907},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

@misc{yamagishi2025modernber,
  title={ModernBERT is More Efficient than Conventional BERT for Chest CT Findings Classification in Japanese Radiology Reports},
  author={Yosuke Yamagishi et al.},
  year={2025},
  eprint={2503.05060},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}
```