xuemduan
/

reevaluate-clip

contrastive-learning

cultural-heritage

information-retrieval

Model card Files Files and versions

reevaluate-clip / README.md

xuemduan's picture

Update README.md

661cbef verified 2 days ago

|

history blame contribute delete

2.17 kB

	---
	language: en
	license: mit
	tags:
	- clip
	- multimodal
	- contrastive-learning
	- cultural-heritage
	- reevaluate
	- information-retrieval
	datasets:
	- xuemduan/reevaluate-image-text-pairs
	model-index:
	- name: REEVALUATE CLIP Fine-tuned Models
	results:
	- task:
	type: image-text-retrieval
	name: Image-Text Retrieval
	dataset:
	name: Cultural Heritage Hybrid Dataset
	type: xuemduan/reevaluate-image-text-pairs
	metrics:
	- name: I2T R@1
	type: recall@1
	value: <TOBE_FILL_IN>
	- name: I2T R@5
	type: recall@5
	value: <TOBE_FILL_IN>
	- name: T2I R@1
	type: recall@1
	value: <TOBE_FILL_IN>
	---


	# Domain-Adaptive CLIP for Multimodal Retrieval

	The fine-tuned CLIP (Vit-L/14) used in Knowledge-Enhanced Multimodal Retrieval


	---

	## 📦 Available Models

	\| Model \| Description \| Data Type \|
	\|--------\|--------------\|-----------\|
	\| `reevaluate-clip` \| Fine-tuned on images, query texts, and description texts \| Image+Text \|
	---

	## 🧾 Dataset

	The models were trained and evaluated on the REEVLAUATE Image-Text Pair Dataset, which contains 43,500 image–text pairs derived from Wikidata and Pilot Museums.

	Each artefact is described by:
	- `Image`: artefact image
	- `Description text`: BLIP-generated natural language portion + meatadata portion
	- `Query text`: User query-like text

	Dataset: [xuemduan/reevaluate-image-text-pairs](https://huggingface.co/datasets/xuemduan/reevaluate-image-text-pairs)

	---

	## 🚀 Usage

	```python
	from transformers import CLIPProcessor, CLIPModel
	from PIL import Image

	model = CLIPModel.from_pretrained("xuemduan/reevaluate-clip")
	processor = CLIPProcessor.from_pretrained("xuemduan/reevaluate-clip")

	image = Image.open("artefact.jpg")
	text = "yellow flower paintings"

	image_embeds = model.get_image_features(**processor(images=image, return_tensors="pt"))
	text_embeds = model.get_text_features(**processor(text=[text], return_tensors="pt"))

	# normalize
	image_embeds = image_embeds / image_embeds.norm(dim=-1, keepdim=True)
	text_embeds = text_embeds / text_embeds.norm(dim=-1, keepdim=True)

	similarity = (image_embeds @ text_embeds.T)
	print(similarity)