all-minilm-l6-v2-civic

This is a sentence-transformers model fine-tuned using TSDAE (Transformer-based Denoising AutoEncoder) on government/civic meeting transcripts.

Model Details

  • Base Model: all-minilm-l6-v2
  • Training Method: TSDAE (unsupervised domain adaptation)
  • Original Source: ar9av/all-minilm-l6-v2-civic

Description

TSDAE fine-tuned model on civic/government meeting transcripts. The model was trained on 10,000 government meeting transcripts to learn domain-specific representations while remaining general-purpose for downstream tasks.

Key Features

  • โœ… Learned domain-specific abbreviations (BOS = Board of Supervisors, etc.)
  • โœ… Enhanced understanding of government/civic terminology
  • โœ… General-purpose embeddings suitable for various downstream tasks

Usage

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('TSG/all-minilm-l6-v2-civic')
embeddings = model.encode(['Your text here'])

Training Data

  • 10,000 government/civic meeting transcripts
  • Various meeting types: City Council, Planning Commission, Board of Education, etc.
  • Text cleaned and chunked for training

Evaluation

The model shows significant improvement in:

  • Domain-specific semantic similarity
  • Abbreviation understanding (BOS, CC, BOE, etc.)
  • Clustering quality for civic domain

Intended Use

This model is designed for general-purpose semantic similarity and embedding tasks, with enhanced understanding of government/civic domain language.

Original Model

This model was originally trained and hosted at: https://huggingface.co/ar9av/all-minilm-l6-v2-civic

Downloads last month
-
Safetensors
Model size
22.7M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support