Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
12
This is a Cross Encoder model finetuned from distilbert/distilroberta-base on the quora-duplicates dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
# Download from the 🤗 Hub
model = CrossEncoder("sentence_transformers_model_id")
# Get scores for pairs...
pairs = [
['What is the step by step guide to invest in share market in india?', 'What is the step by step guide to invest in share market?'],
['What is the story of Kohinoor (Koh-i-Noor) Diamond?', 'What would happen if the Indian government stole the Kohinoor (Koh-i-Noor) diamond back?'],
['How can I increase the speed of my internet connection while using a VPN?', 'How can Internet speed be increased by hacking through DNS?'],
['Why am I mentally very lonely? How can I solve it?', 'Find the remainder when [math]23^{24}[/math] is divided by 24,23?'],
['Which one dissolve in water quikly sugar, salt, methane and carbon di oxide?', 'Which fish would survive in salt water?'],
]
scores = model.predict(pairs)
print(scores.shape)
# [5]
# ... or rank different texts based on similarity to a single text
ranks = model.rank(
'What is the step by step guide to invest in share market in india?',
[
'What is the step by step guide to invest in share market?',
'What would happen if the Indian government stole the Kohinoor (Koh-i-Noor) diamond back?',
'How can Internet speed be increased by hacking through DNS?',
'Find the remainder when [math]23^{24}[/math] is divided by 24,23?',
'Which fish would survive in salt water?',
]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
quora-duplicates-dev and quora-duplicates-testCEClassificationEvaluator| Metric | quora-duplicates-dev | quora-duplicates-test |
|---|---|---|
| accuracy | 0.8938 | 0.8938 |
| accuracy_threshold | 0.5089 | 0.5091 |
| f1 | 0.8612 | 0.8612 |
| f1_threshold | 0.3856 | 0.3858 |
| precision | 0.8183 | 0.8183 |
| recall | 0.9089 | 0.9089 |
| average_precision | 0.9203 | 0.9203 |
sentence1, sentence2, and label| sentence1 | sentence2 | label | |
|---|---|---|---|
| type | string | string | int |
| details |
|
|
|
| sentence1 | sentence2 | label |
|---|---|---|
What are the features of the Indian caste system? |
What triggers you the most when you play video games? |
0 |
What is the best place to learn Mandarin Chinese in Singapore? |
What is the best place in Singapore for durian in December? |
0 |
What will be Hillary Clinton's India policy if she wins the election? |
How would the bilateral relationship between India and the USA be under Hillary Clinton's presidency? |
1 |
BinaryCrossEntropyLosssentence1, sentence2, and label| sentence1 | sentence2 | label | |
|---|---|---|---|
| type | string | string | int |
| details |
|
|
|
| sentence1 | sentence2 | label |
|---|---|---|
What is the step by step guide to invest in share market in india? |
What is the step by step guide to invest in share market? |
0 |
What is the story of Kohinoor (Koh-i-Noor) Diamond? |
What would happen if the Indian government stole the Kohinoor (Koh-i-Noor) diamond back? |
0 |
How can I increase the speed of my internet connection while using a VPN? |
How can Internet speed be increased by hacking through DNS? |
0 |
BinaryCrossEntropyLosseval_strategy: stepsper_device_train_batch_size: 64per_device_eval_batch_size: 64num_train_epochs: 1warmup_ratio: 0.1bf16: Trueoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 64per_device_eval_batch_size: 64per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | Validation Loss | quora-duplicates-dev_average_precision | quora-duplicates-test_average_precision |
|---|---|---|---|---|---|
| -1 | -1 | - | - | 0.3711 | - |
| 0.0167 | 100 | 0.6574 | - | - | - |
| 0.0333 | 200 | 0.4804 | - | - | - |
| 0.0500 | 300 | 0.4406 | - | - | - |
| 0.0666 | 400 | 0.4208 | - | - | - |
| 0.0833 | 500 | 0.3929 | 0.3958 | 0.8210 | - |
| 0.0999 | 600 | 0.3986 | - | - | - |
| 0.1166 | 700 | 0.3743 | - | - | - |
| 0.1332 | 800 | 0.3938 | - | - | - |
| 0.1499 | 900 | 0.3602 | - | - | - |
| 0.1665 | 1000 | 0.3714 | 0.3437 | 0.8565 | - |
| 0.1832 | 1100 | 0.3486 | - | - | - |
| 0.1998 | 1200 | 0.3479 | - | - | - |
| 0.2165 | 1300 | 0.3417 | - | - | - |
| 0.2331 | 1400 | 0.3425 | - | - | - |
| 0.2498 | 1500 | 0.3353 | 0.3264 | 0.8742 | - |
| 0.2664 | 1600 | 0.3335 | - | - | - |
| 0.2831 | 1700 | 0.3274 | - | - | - |
| 0.2998 | 1800 | 0.3284 | - | - | - |
| 0.3164 | 1900 | 0.3118 | - | - | - |
| 0.3331 | 2000 | 0.3073 | 0.3282 | 0.8826 | - |
| 0.3497 | 2100 | 0.3233 | - | - | - |
| 0.3664 | 2200 | 0.3072 | - | - | - |
| 0.3830 | 2300 | 0.314 | - | - | - |
| 0.3997 | 2400 | 0.3065 | - | - | - |
| 0.4163 | 2500 | 0.3046 | 0.2877 | 0.8930 | - |
| 0.4330 | 2600 | 0.2857 | - | - | - |
| 0.4496 | 2700 | 0.285 | - | - | - |
| 0.4663 | 2800 | 0.2957 | - | - | - |
| 0.4829 | 2900 | 0.2965 | - | - | - |
| 0.4996 | 3000 | 0.2824 | 0.2842 | 0.8998 | - |
| 0.5162 | 3100 | 0.3019 | - | - | - |
| 0.5329 | 3200 | 0.2841 | - | - | - |
| 0.5495 | 3300 | 0.2981 | - | - | - |
| 0.5662 | 3400 | 0.2878 | - | - | - |
| 0.5828 | 3500 | 0.278 | 0.2803 | 0.9061 | - |
| 0.5995 | 3600 | 0.2841 | - | - | - |
| 0.6162 | 3700 | 0.2794 | - | - | - |
| 0.6328 | 3800 | 0.2808 | - | - | - |
| 0.6495 | 3900 | 0.27 | - | - | - |
| 0.6661 | 4000 | 0.2719 | 0.2697 | 0.9091 | - |
| 0.6828 | 4100 | 0.2792 | - | - | - |
| 0.6994 | 4200 | 0.2669 | - | - | - |
| 0.7161 | 4300 | 0.2696 | - | - | - |
| 0.7327 | 4400 | 0.2642 | - | - | - |
| 0.7494 | 4500 | 0.2684 | 0.2591 | 0.9140 | - |
| 0.7660 | 4600 | 0.2593 | - | - | - |
| 0.7827 | 4700 | 0.2756 | - | - | - |
| 0.7993 | 4800 | 0.2584 | - | - | - |
| 0.8160 | 4900 | 0.2525 | - | - | - |
| 0.8326 | 5000 | 0.267 | 0.2540 | 0.9168 | - |
| 0.8493 | 5100 | 0.2612 | - | - | - |
| 0.8659 | 5200 | 0.2607 | - | - | - |
| 0.8826 | 5300 | 0.2565 | - | - | - |
| 0.8993 | 5400 | 0.2432 | - | - | - |
| 0.9159 | 5500 | 0.2568 | 0.2489 | 0.9198 | - |
| 0.9326 | 5600 | 0.2572 | - | - | - |
| 0.9492 | 5700 | 0.2658 | - | - | - |
| 0.9659 | 5800 | 0.2568 | - | - | - |
| 0.9825 | 5900 | 0.2539 | - | - | - |
| 0.9992 | 6000 | 0.2458 | 0.2503 | 0.9203 | - |
| -1 | -1 | - | - | - | 0.9203 |
Carbon emissions were measured using CodeCarbon.
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
Base model
distilbert/distilroberta-base