--- license: apache-2.0 language: - en library_name: gliner pipeline_tag: token-classification tags: - entity linking - GLiNER - GLiNKER - ner - bi-encoder base_model: - microsoft/deberta-v3-base - microsoft/deberta-v3-large --- # GLiNER-Linker: Entity Disambiguation Model ![image](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F6405f62ba577649430be5124%2FzTJU_W307kOrJSwcQubxP.png) Bi-encoder model for entity disambiguation - the neural component (Layer 3) of the [GLiNKER framework](https://github.com/Knowledgator/GLinker). ## Models | Model | Base Encoder | Use Case | |-------|--------------|----------| | [gliner-linker-base-v1.0](https://huggingface.co/knowledgator/gliner-linker-base-v1.0) | deberta-v3-base | Balanced performance | | [gliner-linker-large-v1.0](https://huggingface.co/knowledgator/gliner-linker-large-v1.0) | deberta-v3-large | Maximum accuracy | | [gliner-linker-rerank-v1.0](https://huggingface.co/knowledgator/gliner-linker-rerank-v1.0) | ettin-encoder-68m | Reranking | *Both text encoder and label encoder use the same DeBERTa model* ## Quick Start with GLiNKER ```bash pip install git+https://github.com/Knowledgator/GLinker.git ``` ```python from glinker import ConfigBuilder, DAGExecutor # Build pipeline builder = ConfigBuilder(name="entity_linking") # L1: Extract mentions builder.l1.gliner( model="knowledgator/gliner-bi-base-v2.0", labels=["person", "organization", "location"] ) # L2: Candidate retrieval builder.l2.add("dict", priority=0) # L3: Disambiguation with GLiNER-Linker builder.l3.configure( model="knowledgator/gliner-linker-large-v1.0", use_precomputed_embeddings=True ) # Execute executor = DAGExecutor(builder.get_config()) executor.load_entities("entities.jsonl", target_layers=["dict"]) result = executor.execute({ "texts": ["Apple announced new iPhone"] }) # Get linked entities l0_result = result.get("l0_result") for entity in l0_result.entities: if entity.linked_entity: print(f"{entity.mention_text} → {entity.linked_entity.label}") print(f" Score: {entity.linked_entity.score:.3f}") ``` ## Precomputed Embeddings for Speed Enable precomputed embeddings in L2 for 10-100× speedup: ```python builder.l2.embeddings( enabled=True, model_name="knowledgator/gliner-linker-large-v1.0" ) # Precompute embeddings when loading entities executor.load_entities("entities.jsonl", target_layers=["dict"]) executor.precompute_embeddings(target_layers=["postgres"], batch_size=8) ``` ## Set up Reranker ```python builder = ConfigBuilder(name="reranked") builder.l1.gliner(model="knowledgator/gliner-bi-base-v2.0", labels=["gene", "disease"]) builder.l3.configure(model="knowledgator/gliner-linker-base-v1.0") builder.l4.configure( model="knowledgator/gliner-linker-rerank-v1.0", threshold=0.3, max_labels=5, ) builder.save("config.yaml") # Generates L1 → L2 → L3 → L4 → L0 ``` ## Entity Format (JSONL) ```json {"entity_id": "Q312", "label": "Apple Inc.", "description": "American technology company", "entity_type": "organization"} {"entity_id": "Q89", "label": "Apple", "description": "Edible fruit of apple tree", "entity_type": "food"} ``` ## Resources - **GLiNKER Framework**: [GitHub](https://github.com/Knowledgator/GLinker) - **Documentation**: [GLiNKER Docs](https://github.com/Knowledgator/GLinker/blob/main/DOCUMENTATION.md) - **Discord**: [Join Community](https://discord.gg/HbW9aNJ9) ## Citation ```bibtex @misc{stepanov2024glinermultitask, title={GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks}, author={Ihor Stepanov and Mykhailo Shtopko}, year={2024}, eprint={2406.12925}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` --- Developed by [Knowledgator](https://knowledgator.com)