LaBSE Persuasion Detection (Domain-Agnostic)
Model ID: jjprietotorres/labse-persuasion-detection-agnostic
Part of: Time to Trust AI initiative
π§ Model Overview
This is a SetFit model that can be used for text classification.
It implements a few-shot learning pipeline designed for persuasion and rhetorical intent detection.
The model has been trained using an efficient few-shot learning technique that involves:
- Fine-tuning a Sentence Transformer (LaBSE) with contrastive learning to improve semantic representation.
- Training a classification head using features from the fine-tuned Sentence Transformer to moderate and classify persuasive content.
The model identifies five rhetorical or persuasion-related cohorts:
- Supply Scarcity β framing scarcity or urgency
- Authority Endorsement β appealing to authority or credibility
- Misrepresentation β containing misleading or manipulative language
- Logical Appeal β reasoning-based persuasion
- Neutral β factual or non-persuasive statements
βοΈ Intended Use
This model detects persuasive or manipulative rhetoric across content types and domains.
It is domain-agnostic, suitable for use in:
- Chatbot or conversational AI safety layers
- AI content moderation pipelines
- Adversarial persuasion or prompt-injection detection
- Rhetorical bias auditing in generated or user content
The goal is to enhance trust, transparency, and explainability within AI systems.
π Usage
To use this model for inference, first install the SetFit library:
python -m pip install setfit
You can then run inference as follows:
from setfit import SetFitModel
# Download from the Hub and run inference
model = SetFitModel.from_pretrained("jjprietotorres/labse-persuasion-detection-agnostic")
labels = ["Supply Scarcity", "Authority Endorsement","Misrepresentation","Neutral","Logical Appeal"]
# Run inference
texts = [
"Experts agree this is the only viable solution.",
"Hurry, offer ends tonight!",
"Data clearly supports our argument."
]
probas = model.predict_proba(texts)
pooled_probas = probas.max(axis=0).values.tolist()
idxs = probas.max(axis=0).indices.tolist()
result = zip(labels, pooled_probas, [texts[i] for i in idxs])
report = [{"text": text, "label": label, "score": score} for label, score, text in result if label != "Neutral"]
print(report)
Expected output:
[{'text': 'Hurry, offer ends tonight!', 'label': 'Supply Scarcity', 'score': 0.5646279805087063}, {'text': 'Data clearly supports our argument.', 'label': 'Authority Endorsement', 'score': 0.25684773715338116}, {'text': 'Experts agree this is the only viable solution.', 'label': 'Misrepresentation', 'score': 0.5366529547271727}, {'text': 'Data clearly supports our argument.', 'label': 'Logical Appeal', 'score': 0.23385324245461217}]
π§© Training Setup
- Base model: LaBSE (Language-Agnostic BERT Sentence Embedding)
- Architecture: Sentence Transformer fine-tuned with contrastive learning
- Classifier: SetFit head trained on limited labeled samples
- Frameworks: SetFit, Sentence Transformers, PyTorch
- Training data: custom persuasion dataset covering rhetorical cohorts
- Few-shot strategy: enables data-efficient fine-tuning
π Evaluation (example)
| Metric | Score |
|---|---|
| Accuracy | 0.86 |
| F1 Score | 0.84 |
| Precision | 0.83 |
| Recall | 0.85 |
(replace with your actual metrics once validated)
βοΈ Ethical Considerations
- This model identifies persuasive tactics, not factual accuracy or moral intent.
- Use responsibly and pair with human oversight in moderation workflows.
- Model outputs may reflect training data biases.
- Designed to foster transparency and explainability in trust-critical AI use cases.
π Integration within Time to Trust AI
This model is a trust-building component of the Time to Trust AI framework.
By publishing it openly, the goal is to make trust operational β demonstrating transparency in model design, intent, and interpretability.
π Citation
If you use this model or methodology, please cite both the SetFit paper and this model:
@article{https://doi.org/10.48550/arxiv.2209.11055,
doi = {10.48550/ARXIV.2209.11055},
url = {https://arxiv.org/abs/2209.11055},
author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
title = {Efficient Few-Shot Learning Without Prompts},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}
@misc{prietotorres2025labsepersuasion,
title={LaBSE Persuasion Detection (Domain-Agnostic)},
author={J.J. Prieto-Torres},
year={2025},
howpublished={\url{https://huggingface.co/jjprietotorres/labse-persuasion-detection-agnostic}},
}
πͺͺ License
Apache License 2.0 β you are free to use, modify, and distribute with attribution.
π¬ Contact
Author: J.J. Prieto
Hugging Face: jjprietotorres
- Downloads last month
- 5