PyTorch
English
bert

Model card for OpenBioNER-tiny-v2

obn-v2_logo

ESWA Paper   GitHub Repo   HF Demo

OpenBioNER-tiny-v2 is the smallest model in the OpenBioNER-v2 family, with only 15M parameters.

It is designed for ultra-fast biomedical NER in environments where latency, size, and memory footprint are top priorities. Despite its size, it retains the description-driven zero-shot design that powers the OpenBioNER series.

Available Models

Release Model Name # Size Domain Language License
v1 openbioner-base 110M Biomedicine English MIT
v2 openbioner-base-v2
openbioner-compact-v2
openbioner-tiny-v2
openbioner-base-v2-deid
110M
65M
15M
110M
Biomedicine
Biomedicine
Biomedicine
PHI de-identification
English MIT

Model Details

This model still leverages the two-stage training pipeline of openbioner-base-v2, including large-scale ontology-based pretraining and refinement on high-quality biomedical annotations. It achieves strong zero-shot performance and remains fully open, MIT-licensed, and production-ready.

It is ideal for:

  • Mobile or edge deployment where GPU resources are limited.

  • Real-time inference on biomedical text streams.

  • Prototyping and lightweight applications with modest accuracy requirements.

Installation

To use this model, you must install the IBM Zshot library:

!pip install -U zshot==0.0.11 datasets gliner
!python -m spacy download en_core_web_sm

Usage

import spacy

from zshot import PipelineConfig, displacy
from zshot.linker import LinkerSMXM
from zshot.evaluation.metrics._seqeval._seqeval import Seqeval
from zshot.utils.data_models import Entity
from zshot.evaluation.zshot_evaluate import evaluate, prettify_evaluate_report

# define your list of candidate entity types
entities = [
     Entity(name='BACTERIUM', description='A bacterium refers to a type of microorganism that can exist as a single cell and may cause infections or play a role in various biological processes. Examples include species like Streptococcus pneumoniae and Streptomyces ahygroscopicus.', vocabulary=None),
]

nlp = spacy.blank("en")
nlp_config = PipelineConfig(
    linker=LinkerSMXM(model_name="disi-unibo-nlp/openbioner-base-v2"),
    entities=entities,
    device='cuda' # or 'cpu' if GPU not available
)
nlp.add_pipe("zshot", config=nlp_config, last=True)


sentence = "Impact of cofactor - binding loop mutations on thermotolerance and activity of E. coli transketolase"
doc = nlp(sentence)

displacy.render(doc, style="ent")

Run Evaluation

Let's evaluate the model on NCBI dataset

from datasets import load_dataset
ds = load_dataset('disi-unibo-nlp/ncbi', split='test')
print("Tokens:", ds['tokens'][0])
print("Tags:", ds['ner_tags'][0])
print("Unique labels:", set([t[2:] for tag in ds['ner_tags'] for t in tag if t != 'O']))
Tokens: ['Clustering', 'of', 'missense', 'mutations', 'in', 'the', 'ataxia', '-', 'telangiectasia', 'gene', 'in', 'a', 'sporadic', 'T', '-', 'cell', 'leukaemia', '.']
Tags: ['O', 'O', 'O', 'O', 'O', 'O', 'B-DISEASE', 'I-DISEASE', 'I-DISEASE', 'O', 'O', 'O', 'B-DISEASE', 'I-DISEASE', 'I-DISEASE', 'I-DISEASE', 'I-DISEASE', 'O']
Unique labels: {'DISEASE'}

Evaluation with zshot library

import spacy

from zshot import PipelineConfig, displacy
from zshot.linker import LinkerSMXM
from zshot.evaluation.metrics._seqeval._seqeval import Seqeval
from zshot.utils.data_models import Entity
from zshot.evaluation.zshot_evaluate import evaluate, prettify_evaluate_report



entities = [
  Entity(name='DISEASE', description='A disease is a medical condition that disrupts normal bodily functions or structures, affecting various organs or systems, and leading to symptoms like muscle weakness, fatigue, stiffness, or cognitive impairment. Diseases can impact muscles, the nervous system, heart, eyes, and more, and may be chronic or acute, such as diabetes, cardiovascular or neurological disorders, and cancer-related conditions like lymphoblastic leukemia or lymphoma.', vocabulary=None),
]

nlp = spacy.blank("en")
nlp_config = PipelineConfig(
    linker=LinkerSMXM(model_name="disi-unibo-nlp/openbioner-base-v2"),
    entities=entities,
    device='cuda'
)

nlp.add_pipe("zshot", config=nlp_config, last=True)
print("Evaluating...")
evaluation = evaluate(nlp, ds, metric=Seqeval())
print("Done!")

print(prettify_evaluate_report(evaluation)[0])
Evaluating...
Done!
+------------------------------------+
|              linker -              |
|        General - span-based        |
+-------------------------+----------+
|          Metric         |          |
+-------------------------+----------+
| overall_precision_micro |  0.5387  |
|   overall_recall_micro  |  0.5116  |
|     overall_f1_micro    |  0.5248  |
| overall_precision_macro |  0.5387  |
|   overall_recall_macro  |  0.5116  |
|     overall_f1_macro    |  0.5248  |
|     overall_accuracy    |  0.9500  |
|  total_time_in_seconds  |  3.1618  |
|    samples_per_second   | 297.3021 |
|    latency_in_seconds   |  0.0034  |
+-------------------------+----------+

Performance

OpenBioNER-tiny-v2 trades some accuracy for extreme efficiency, still beating larger general-purpose NER models in biomedical domains.

image/png

Span-level F1 micro scores

Model Size AnatEM BC2GM BC4CHEMD BC5CDR BioRED JNLPBA NCBI MedMentions-R (Dev) MedMentions-R (Test) JNLPBA-R AVG
OpenBioNER-base-v2 110M 48.9 57.4 61.1 72.4 58.3 60.3 66.8 44.9 54.7 64.7 59.0
GLiNER-BioMed-bi-large-v1.0 459M 37.1 63.9 55.8 72.6 59.6 64.3 68.5 36.1 56.1 60.3 57.4
GLiNER-BioMed-bi-base-v1.0 209M 36.2 62.5 53.9 69.7 51.1 62.8 66.1 37.5 50.9 56.7 54.7
GLiNER-BioMed-bi-small-v1.0 166M 28.9 60.6 53.3 69.5 47.9 60.1 64.4 36.4 50.4 56.6 52.8
NuNER-Zero 459M 42.4 52.5 47.9 68.5 53.1 53.8 61.2 38.7 56.6 42.8 51.7
OpenBioNER-base-v1 110M 34.8 49.5 47.1 60.1 46.9 56.8 57.8 37.8 52.5 64.6 50.9
OpenBioNER-compact-v2 65M 39.9 55.0 60.6 71.3 42.6 55.3 59.5 27.9 45.2 47.7 50.4
GLiNER-BioMed-large-v1.0 459M 37.0 54.9 50.8 69.3 58.8 52.0 57.5 34.0 42.6 42.9 49.9
GLiNER-large-v2.5 459M 36.8 52.2 47.6 69.4 51.6 50.1 64.2 33.3 45.6 47.2 49.8
GLiNER-medium-v2.5 209M 42.0 56.7 45.8 67.0 52.1 56.0 65.2 33.2 43.5 38.3 49.9
GLiNER-Multitask-large-v0.5 459M 28.9 44.9 43.6 60.0 45.5 44.2 55.4 57.6 65.4 52.6 49.8
GLiNER-small-v2.5 166M 38.2 51.7 40.1 66.8 48.5 55.3 64.8 35.9 43.1 43.6 48.8
GLiNER-large-v2.1 459M 21.8 43.2 52.8 68.0 51.2 56.6 59.3 31.7 46.8 56.4 48.7
GLiNER-BioMed-base-v1.0 209M 32.4 54.0 46.8 68.1 51.2 50.9 55.8 29.5 39.3 37.5 46.5
GLiNER-BioMed-small-v1.0 166M 33.2 53.5 43.0 67.3 48.6 47.0 56.5 30.3 35.4 35.3 45.0
OpenBioNER-tiny-v2 15M 40.4 46.9 50.9 62.6 39.1 50.1 52.5 21.0 38.6 43.7 44.6
UniNER-7B-type 7B 25.1 46.2 47.9 68.0 62.4 48.1 60.4 32.8 53.4 50.2 49.5
LLaMA-3-Med42-8B 8B 36.3 47.3 53.8 65.8 52.2 47.1 60.9 21.8 34.8 41.5 46.2
Qwen3-8B 8B 35.4 40.3 44.9 64.9 58.3 37.1 40.5 25.1 31.2 21.2 39.9
LLaMA-3.1-8B-Instruct 8B 31.4 39.9 38.9 58.7 57.2 33.5 37.7 21.4 25.0 17.8 36.2
Qwen2.5-Aloe-Beta-7B 7B 31.7 32.6 42.5 48.1 28.8 39.6 46.9 15.9 28.3 25.5 34.0
MediPhi-Instruct 3.8B 25.0 28.0 23.2 49.3 43.9 25.9 28.6 17.4 19.6 23.4 28.4
BioMistral-7B-DARE 7B 18.4 28.9 25.3 44.1 36.5 24.5 28.7 14.4 15.4 17.3 25.4

Token-level F1 micro scores

Model Size AnatEM BC2GM BC4CHEMD BC5CDR BioRED JNLPBA NCBI MedMentions-R (Dev) MedMentions-R (Test) JNLPBA-R AVG
OpenBioNER-base-v2 110M 61.0 76.7 78.9 77.0 66.4 73.1 79.0 57.1 70.7 81.3 72.1
OpenBioNER-compact-v2 65M 55.5 74.2 78.4 78.4 57.1 67.5 76.0 43.7 59.8 68.8 65.9
GLiNER-BioMed-bi-large-v1.0 459M 48.1 70.4 59.0 76.3 66.4 71.7 77.2 44.3 68.0 74.5 65.6
OpenBioNER-base-v1 110M 49.8 63.4 67.7 68.1 56.2 70.1 74.3 52.0 68.5 82.5 65.4
NuNER-Zero 459M 48.8 64.6 64.6 74.0 62.1 65.8 74.9 49.7 69.9 62.7 63.7
GLiNER-BioMed-bi-base-v1.0 209M 45.3 71.5 56.1 73.8 57.4 69.7 75.9 46.4 64.3 71.9 63.2
GLiNER-Multitask-large-v0.5 459M 45.1 64.8 65.4 70.4 55.6 54.6 71.7 65.4 75.3 63.2 63.1
OpenBioNER-tiny-v2 15M 60.1 69.2 72.9 73.0 48.9 63.8 73.3 36.1 56.1 64.1 61.7
GLiNER-BioMed-bi-small-v1.0 166M 36.2 68.1 55.6 73.5 54.8 66.6 74.5 44.9 63.4 73.6 61.5
GLiNER-medium-v2.5 209M 57.4 70.5 52.9 71.5 61.0 63.5 77.5 40.7 53.3 56.5 60.5
GLiNER-small-v2.5 166M 51.7 66.5 48.2 70.9 58.0 63.3 75.1 43.5 52.1 59.8 58.9
GLiNER-large-v2.5 459M 47.1 65.7 54.7 73.2 59.1 58.2 74.0 39.0 54.4 59.3 58.4
GLiNER-BioMed-large-v1.0 459M 48.9 66.7 54.9 72.1 64.8 57.9 64.9 41.0 53.1 58.6 58.3
GLiNER-BioMed-base-v1.0 209M 49.1 67.7 52.3 71.8 59.6 58.3 67.5 35.8 49.1 55.6 56.6
GLiNER-large-v2.1 459M 29.9 48.0 56.5 72.4 58.7 61.4 69.0 42.5 60.0 66.0 56.4
GLiNER-BioMed-small-v1.0 166M 52.1 66.7 49.0 71.9 55.7 53.7 68.9 37.2 43.7 56.0 55.5
UniNER-7B-type 7B 37.2 55.6 60.9 72.8 67.0 60.2 75.9 37.6 60.8 76.7 60.5
LLaMA-3-Med42-8B 8B 49.6 52.8 48.7 64.3 55.2 52.7 68.3 27.9 43.6 59.3 52.2
Qwen3-8B 8B 48.8 53.8 54.6 66.4 59.8 47.1 52.2 32.1 38.4 46.8 50.0
LLaMA-3.1-8B-Instruct 8B 44.6 53.9 47.6 61.5 58.5 42.9 50.2 28.2 30.3 40.9 45.9
Qwen2.5-Aloe-Beta-7B 7B 46.5 43.7 51.4 52.7 39.5 46.3 59.1 18.0 34.1 47.0 43.8
MediPhi-Instruct 3.8B 39.4 39.6 29.5 49.6 49.4 34.6 39.7 23.0 23.1 40.6 36.9
BioMistral-7B-DARE 7B 32.0 36.7 27.8 43.7 37.6 30.2 37.0 18.5 17.2 34.3 31.5

⚠️ Note: All results above were computed using the zshot library (v0.0.11), which supports both GLiNER and OpenBioNER architectures. For all GLiNER models, evaluations were performed using lowercase type names and a threshold of 0.5. To reproduce the results, please refer to our GitHub repository.

Descriptions used to evaluate OpenBioNER-tiny-v2 for each dataset.


Negative Class

This is the description used as NEG class (e.g. not an entity) for all the datasets, execept for MedMentions-Rare:

Coal, water, oil, etc. are normally used for traditional electricity generation. However using liquefied natural gas as fuel for joint circulatory electricity generation has advantages. The chief financial officer is the only one there taking the fall. It has a very talented team, eh. What will happen to the wildlife? I just tell them, you've got to change. They're here to stay. They have no insurance on their cars. What else would you like? Whether holding an international cultural event or setting the city's cultural policies, she always asks for the participation or input of other cities and counties.


AnatEM

Type Description
ANATOMY The anatomy refers to biological components at various scales, including cells, tissues, and organs. These entities can be identified by proper nouns referring to cell types (e.g., HeLa cells, neurospheres, NSCLC, SCC), body parts (e.g., serum, blood) or biological substances (e.g., vegetables, meats, cow milk) or tumors.

BC2GM

TYPE Description
GENE A gene is a hereditary DNA segment that encodes a functional product, often a protein or RNA, and can encompass coding sequences, operons, promoters, and regulatory regions. It is passed from parents to offspring and determines inherited traits. Genes are represented by alphanumeric identifiers of varying lengths, from short codes (e.g., trios, ABL, DNA-PK) to longer names (e.g., ERCC3Dm protein, PTPN6 transcript, HPV E6/E7).

BC4CHEMD

TYPE Description
CHEMICAL Chemicals are substances that are composed of one or more elements, typically consisting of atoms bonded together by chemical bonds. They can be naturally occurring, such as vitamins or sterols, or synthesized, like alkylcarbazoles or tetrachlorodibenzo-p-dioxins (TCDD). Chemicals can also be modified or combined to form new compounds, such as esters or polymers.

BC5CDR

TYPE Description
CHEMICAL Chemicals are substances that are composed of atoms, either bonded together in a molecule or as a mixture of different substances. This includes medications (e.g., nitroarginine methyl ester, nifedipine, prednisolone, methyldopa), compounds (e.g., potassium, calcium, ammonium), and other substances that can have various effects on the body.
DISEASE Diseases are any medical condition that affects the normal functioning of the body, resulting in symptoms, discomfort, or potentially life-threatening complications. This includes chronic and acute disorders, conditions affecting specific bodily systems, cancer-related conditions, and complications arising from medical treatments or external factors.

BioRED

Type Description
Disease Or Phenotypic Feature A disease is a medical condition indicating abnormal structure or function of the body. This includes phenotypic features like physical traits (e.g., facial dysmorphism, growth abnormalities, skin lesions), behavioral changes, or measurable clinical signs (e.g., abnormal lab results, imaging findings). Examples include diabetes, cardiovascular disease, neurological disorders, and cancer-related conditions.
Organism Taxon An organism taxon refers to a species, genus, or other taxonomic rank of a living organism, as well as general terms for living beings, especially humans. It helps identify the biological subject of a study or medical context. Examples include "human", "patient" and "Chinese hamster".
Chemical Entity Chemical substances or compounds, including drugs, elements, or other chemicals used in medical contexts. Chemicals may refer to a drug's effect (e.g., tetrodotoxin-sensitive) or the drug itself (e.g., lidocaine, mexiletine, actinomycin D). Includes therapeutic and toxic agents.
Sequence Variant Sequence variant represents alterations in the fundamental building blocks of genetic information. This includes changes that occur in either DNA nucleotide sequences or the resulting protein amino acid sequences. Examples include "G--A substitution at codon 1763", "V1764M", "A118G", and "N40D".
Cell Line A cell line refers to a permanently established line of cells grown in vitro, originating from a single cell or a small group of cells, often used in research. Examples include "tsA201", "BEP2D" and "Het-1A".

JNLPBA

TYPE Description
PROTEIN A protein is a large biomolecule composed of one or more chains of amino acids, essential for structure and function within cells. Proteins serve as enzymes, receptors, and signaling molecules, playing critical roles in hormone action, immune response, and cellular communication.
DNA DNA refers to a molecule that contains the genetic instructions used in the development and function of all living organisms. It is composed of two strands of nucleotides that are coiled together in a double helix structure.
CELL_TYPE A cell type refers to a specific category of cells defined by characteristic morphology, function, and molecular markers. Examples include lymphocytes, leukocytes, mononuclear cells, polymorphonuclear leukocytes, and B-lymphoblastoid cells.
CELL_LINE A cell line is a population of cells derived from a single cell, cultured in vitro or in vivo. It can be normal or transformed, with genetic changes like mutations. Cell lines are used in research to study cellular processes, model diseases, and develop treatments.
RNA RNA is a type of nucleic acid that plays a crucial role in the transmission of genetic information from DNA to proteins. It is a single-stranded molecule composed of nucleotides.

NCBI

TYPE Description
DISEASE A disease is a medical condition that disrupts normal bodily functions or structures, affecting various organs or systems, and leading to symptoms like muscle weakness, fatigue, stiffness, or cognitive impairment. Diseases can impact muscles, the nervous system, heart, eyes, and more, and may be chronic or acute, such as diabetes, cardiovascular or neurological disorders, and cancer-related conditions like lymphoblastic leukemia or lymphoma.

MedMentions-Rare (Dev)

TYPE Description
NEG In this study, we fabricated prevascularized synthetic device ports to help mitigate this limitation. Thus, the optimum range of pore size for prevascularization of these membranes was estimated to be 75 - 100 μm. A total of 51 patients were included, 16 in group I and 35 in group II."
Biomedical Occupation or Discipline (T091) A biomedical occupation or discipline is a professional field or area of study that applies biological and medical sciences to healthcare, research, and clinical practice. Biomedical occupations include roles like physicians, nurses, pharmacists, medical researchers, and biomedical engineers.
Clinical Attributes (T201) Clinical attributes refer to measurable or observable characteristics, findings, or parameters related to a patient's health status or medical condition. These attributes are often assessed during clinical examinations, diagnostic procedures, or reported by the patient. Examples include "BMI", "pain intensity", "ultrasonographical features", and "depth".
Injury or Poisoning (T037) This entity type refers to physical harm, damage, or adverse health effects caused by external factors or substances. It includes various types of bodily injuries, trauma, or toxic reactions. Examples include "hip fracture", "hanging", "toxic effects", and "harmful effect".
Organization (T092) An organization is a structured entity or institution that brings together people, resources, and systems to achieve specific goals or purposes. They range from hospitals, universities, and research institutes to corporations, government agencies, foundations, and professional associations. Examples include "General Surgery and Trauma of the Clinics Hospital", "Medical School", "University of Sao Paulo".
Virus (T005) This refers to a virus like "HIV", "hepatitis C virus", or "porcine reproductive and respiratory syndrome virus".

MedMentions-Rare (Test)

TYPE Description
NEG In this study, we fabricated prevascularized synthetic device ports to help mitigate this limitation. Thus, the optimum range of pore size for prevascularization of these membranes was estimated to be 75 - 100 μm. A total of 51 patients were included, 16 in group I and 35 in group II."
Bacterium (T007) A bacterium refers to a type of microorganism that can exist as a single cell and may cause infections or play a role in various biological processes.
Body Substance (T031) A body substance is any material produced by or found within the body, such as blood, serum, saliva, sweat, or gastric acid.
Food (T168) A food refers to any substance consumed to provide nutritional support for the body. This includes snacks, meat, dairy products, grains, and edible substances like carbohydrates, proteins, and fats.
Body System (T022) A body system consists of interconnected organs and tissues working together to carry out essential functions. Examples include the gastrointestinal tract, nervous system, hematological system, and endocrine system.
Professional or Occupational Group (T097) A professional refers to individuals who share the same profession, occupation, or role within a specific field. Examples include cardiologists, psychologists, assessors, hospice staff, and volunteers.

JNLPBA-Rare

TYPE Description
CELL_LINE A cell line is a population of cells derived from a single cell, cultured in vitro or in vivo. It can be normal or transformed, with genetic changes like mutations. Cell lines are used in research to study cellular processes, model diseases, and develop treatments.
RNA RNA is a type of nucleic acid that plays a crucial role in the transmission of genetic information from DNA to proteins. It is a single-stranded molecule composed of nucleotides.

🧬 How to Write Effective Entity Type Descriptions

Entity type descriptions are crucial for improving generalization in OpenBioNER. Well-written descriptions help models disambiguate types, handle rare classes, and align with real-world usage across diverse datasets.

✅ Best Practices

  • Start with a clear definition: Briefly explain what the entity type is.

  • Include functions or context: Add what it does, its purpose, or where it appears.

  • List 3–5 concrete examples: Use domain-relevant examples (e.g., real diseases, proteins, or food items).

  • Mention subtypes or synonyms (optional): Helps capture lexical variation and rare mentions.

  • Keep it concise: 1–3 well-structured sentences are ideal.

⚠️ Common Mistakes to Avoid

  • Vague or overly generic descriptions
  • No examples
  • Just a list of terms
  • Redundant or circular wording

🧪 Template (Recommended Format)

A [TYPE] refers to [concise definition]. It includes examples such as [example1], [example2], and [example3].

Authors

📬 Contacts

For questions, collaborations, or feedback, feel free to reach out:

Downloads last month
77
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for disi-unibo-nlp/openbioner-tiny-v2

Finetuned
(104)
this model

Datasets used to train disi-unibo-nlp/openbioner-tiny-v2

Space using disi-unibo-nlp/openbioner-tiny-v2 1

Collection including disi-unibo-nlp/openbioner-tiny-v2