reevaluate-clip / README.md
xuemduan's picture
Update README.md
661cbef verified
---
language: en
license: mit
tags:
- clip
- multimodal
- contrastive-learning
- cultural-heritage
- reevaluate
- information-retrieval
datasets:
- xuemduan/reevaluate-image-text-pairs
model-index:
- name: REEVALUATE CLIP Fine-tuned Models
results:
- task:
type: image-text-retrieval
name: Image-Text Retrieval
dataset:
name: Cultural Heritage Hybrid Dataset
type: xuemduan/reevaluate-image-text-pairs
metrics:
- name: I2T R@1
type: recall@1
value: <TOBE_FILL_IN>
- name: I2T R@5
type: recall@5
value: <TOBE_FILL_IN>
- name: T2I R@1
type: recall@1
value: <TOBE_FILL_IN>
---
# Domain-Adaptive CLIP for Multimodal Retrieval
The fine-tuned CLIP (Vit-L/14) used in **Knowledge-Enhanced Multimodal Retrieval**
---
## 📦 Available Models
| Model | Description | Data Type |
|--------|--------------|-----------|
| `reevaluate-clip` | Fine-tuned on images, query texts, and description texts | Image+Text |
---
## 🧾 Dataset
The models were trained and evaluated on the **REEVLAUATE Image-Text Pair Dataset**, which contains **43,500 image–text pairs** derived from Wikidata and Pilot Museums.
Each artefact is described by:
- `Image`: artefact image
- `Description text`: BLIP-generated natural language portion + meatadata portion
- `Query text`: User query-like text
Dataset: [xuemduan/reevaluate-image-text-pairs](https://huggingface.co/datasets/xuemduan/reevaluate-image-text-pairs)
---
## 🚀 Usage
```python
from transformers import CLIPProcessor, CLIPModel
from PIL import Image
model = CLIPModel.from_pretrained("xuemduan/reevaluate-clip")
processor = CLIPProcessor.from_pretrained("xuemduan/reevaluate-clip")
image = Image.open("artefact.jpg")
text = "yellow flower paintings"
image_embeds = model.get_image_features(**processor(images=image, return_tensors="pt"))
text_embeds = model.get_text_features(**processor(text=[text], return_tensors="pt"))
# normalize
image_embeds = image_embeds / image_embeds.norm(dim=-1, keepdim=True)
text_embeds = text_embeds / text_embeds.norm(dim=-1, keepdim=True)
similarity = (image_embeds @ text_embeds.T)
print(similarity)