---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:6066
- loss:OnlineContrastiveLoss
base_model: sentence-transformers/all-mpnet-base-v2
widget:
- source_sentence: Mitochondria, often called 'powerhouses of the cell,' generate
most of the cell's ATP through cellular respiration and have their own DNA.
sentences:
- Plate tectonics theory explains that Earth's lithosphere is divided into plates
that move, causing earthquakes, volcanoes, and mountain formation.
- The Titanic was intentionally sunk as part of an insurance scam by J.P. Morgan.
- Why can't you trust a statistician? They're always plotting something, and they
have a mean personality.
- source_sentence: Sharks have existed for about 400 million years, predating trees
(which appeared around 350 million years ago).
sentences:
- What is a physicist's favorite food? Fission chips.
- Venus has a surface temperature of ~465°C (870°F) due to a runaway greenhouse
effect from its dense CO2 atmosphere, making it hotter than Mercury.
- My therapist told me time heals all wounds. So I stabbed him. Now we wait. For
science!
- source_sentence: CRISPR-Cas9 is a gene-editing tool that uses a guide RNA to direct
the Cas9 enzyme to a specific DNA sequence for cutting.
sentences:
- Plate tectonics theory explains that Earth's lithosphere is divided into plates
that move, causing earthquakes, volcanoes, and mountain formation.
- Elvis Presley faked his death and is still alive, living in secret.
- Why don't skeletons fight each other? They don't have the guts.
- source_sentence: Venus has a surface temperature of ~465°C (870°F) due to a runaway
greenhouse effect from its dense CO2 atmosphere, making it hotter than Mercury.
sentences:
- JFK was assassinated by the CIA/Mafia/LBJ, not a lone gunman.
- Why do programmers prefer dark mode? Because light attracts bugs.
- Plate tectonics theory explains that Earth's lithosphere is divided into plates
that move, causing earthquakes, volcanoes, and mountain formation.
- source_sentence: Finland doesn't exist; it's a fabrication by Japan and Russia.
sentences:
- Why did the functions stop calling each other? Because they had constant arguments
and no common ground.
- What's a pirate's favorite programming language? Rrrrr! (or C, for the sea)
- The lost city of Atlantis is real and its advanced technology is hidden from us.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy
- cosine_accuracy_threshold
- cosine_f1
- cosine_f1_threshold
- cosine_precision
- cosine_recall
- cosine_ap
- cosine_mcc
model-index:
- name: SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
results:
- task:
type: binary-classification
name: Binary Classification
dataset:
name: meme dev binary
type: meme-dev-binary
metrics:
- type: cosine_accuracy
value: 1.0
name: Cosine Accuracy
- type: cosine_accuracy_threshold
value: 0.7174700498580933
name: Cosine Accuracy Threshold
- type: cosine_f1
value: 1.0
name: Cosine F1
- type: cosine_f1_threshold
value: 0.7174700498580933
name: Cosine F1 Threshold
- type: cosine_precision
value: 1.0
name: Cosine Precision
- type: cosine_recall
value: 1.0
name: Cosine Recall
- type: cosine_ap
value: 0.9999999999999999
name: Cosine Ap
- type: cosine_mcc
value: 1.0
name: Cosine Mcc
---
# SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2).
The main goal of thius fine-tuned model is to assignb memes into 3 different clusters:
- Conspiracy
- Cluster Educational Science Humor
- Wordplay & Nerd Humor
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
- **Maximum Sequence Length:** 384 tokens
- **Output Dimensionality:** 768 dimensions
- **Similarity Function:** Cosine Similarity
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
model = 'PietroSaveri/meme-cluster-classifier'
fine_tuned_model = SentenceTransformer(model)
# 3) Compute centroids just once
seed_centroids = {}
for cat, texts in seed_texts.items():
embs = embedding_model.encode(texts, convert_to_numpy=True)
seed_centroids[cat] = embs.mean(axis=0)
# 4) Define a tiny helper for cosine
def cosine_sim(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
# 5) Wrap it all up in a function
def predict(text: str):
vec = fine_tuned_model.encode(text, convert_to_numpy=True)
sims = { cat: cosine_sim(vec, centroid) for cat, centroid in seed_centroids.items()}
# sort by descending similarity
assigned = max(sims, key=sims.get)
return sims, assigned
# --- USAGE ---
text = "Why did the biologist go broke? Because his cells were division!"
scores, ranking = predict(text)
print("Raw scores:")
for cat, score in scores.items():
print(f" {cat:25s}: {score:.3f}")Raw scores:
# Conspiracy : 0.700
# Wordplay & Nerd Humor : 0.907
# Educational Science Humor: 0.903
```
## Evaluation
### Metrics
#### Binary Classification
* Dataset: `meme-dev-binary`
* Evaluated with [BinaryClassificationEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
| Metric | Value |
|:--------------------------|:--------|
| cosine_accuracy | 1.0 |
| cosine_accuracy_threshold | 0.7175 |
| cosine_f1 | 1.0 |
| cosine_f1_threshold | 0.7175 |
| cosine_precision | 1.0 |
| cosine_recall | 1.0 |
| **cosine_ap** | **1.0** |
| cosine_mcc | 1.0 |
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 6,066 training samples
* Columns: sentence_0, sentence_1, and label
* Approximate statistics based on the first 1000 samples:
| | sentence_0 | sentence_1 | label |
|:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------|
| type | string | string | float |
| details |
The cure for AIDS was discovered decades ago but suppressed to reduce world population. | Einstein’s theory of general relativity describes gravity not as a force, but as the curvature of spacetime caused by mass and energy. | 0.0 |
| 5G towers are designed to activate nanoparticles from vaccines for population control. | The Mandela Effect proves we've shifted into an alternate reality. | 1.0 |
| The Georgia Guidestones were a NWO manifesto, destroyed to hide the plans. | Elvis Presley faked his death and is still alive, living in secret. | 1.0 |
* Loss: [OnlineContrastiveLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#onlinecontrastiveloss)
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `num_train_epochs`: 4
- `multi_dataset_batch_sampler`: round_robin
#### All Hyperparameters