Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
• 1908.10084 • Published
• 12
This is a sentence-transformers model finetuned from Qwen/Qwen3-Embedding-0.6B on the train dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'Qwen3Model'})
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("b00l26/Qwen3-Embedding-0.6B-finetune-news-final")
# Run inference
queries = [
"M\u1ef9 \u0111i\u1ec1u_tra tu\u00e2n_th\u1ee7 tho\u1ea3_thu\u1eadn th\u01b0\u01a1ng_m\u1ea1i Trung_Qu\u1ed1c . V\u0103n_ph\u00f2ng \u0110\u1ea1i_di\u1ec7n Th\u01b0\u01a1ng_m\u1ea1i M\u1ef9 ( USTR ) \u0111i\u1ec1u_tra tu\u00e2n_th\u1ee7 tho\u1ea3_thu\u1eadn th\u01b0\u01a1ng_m\u1ea1i Trung_Qu\u1ed1c k\u00fd_k\u1ebft 2020 .",
]
documents = [
"Trung_Quốc đồng_ý đàm_phán thương_mại ' ' Mỹ . Trung_Quốc đồng_ý tiến_hành vòng đàm_phán thương_mại Mỹ .",
'Thời_điểm đẹp Hà_Giang ngắm tam_giác mạch ? . Du_khách Hà_Nội ngắm hoa tam_giác mạch Hà_Giang , tư_vấn hoa nở_rộ , rực_rỡ .',
'Bất_động_sản Nam trở_lại đường_đua sáp_nhập địa_giới hành_chính . ( Dân_trí ) - Trong cung lõi TPHCM khan_hiếm khu_vực lân_cận Bình_Dương , Long_An Đồng_Nai ( cũ ) dồi_dào , đa_dạng phân khúc .',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 1024] [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.6432, -0.1029, 0.0068]])
InformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.0006 |
| cosine_accuracy@3 | 0.3073 |
| cosine_accuracy@5 | 0.4208 |
| cosine_accuracy@10 | 0.5527 |
| cosine_precision@1 | 0.0006 |
| cosine_precision@3 | 0.1187 |
| cosine_precision@5 | 0.1135 |
| cosine_precision@10 | 0.0879 |
| cosine_recall@1 | 0.0002 |
| cosine_recall@3 | 0.1801 |
| cosine_recall@5 | 0.276 |
| cosine_recall@10 | 0.4051 |
| cosine_ndcg@10 | 0.223 |
| cosine_mrr@10 | 0.1793 |
| cosine_map@100 | 0.163 |
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
Nam_Định : Thuê thần đèn di_dời 500 đường điện . ( Dân_trí ) - Tiếc 2 tầng xây_dựng khả_năng dỡ , Thuế mua chi nửa tỷ đồng thuê thần đèn di_chuyển đất . |
Mục_sở_thị mỏng Vương_quốc_Anh rao 1,3 triệu USD . Ngôi nằm ga tàu_điện_ngầm Goldhawk_Road tây London , rộng 6 mét , 5 tầng tổng diện_tích 1.034 mét_vuông . Tuy mỏng , rao giá 950.000 bảng Anh ( 1,3 triệu USD ) . |
Trang_trí đám_cưới , đám_hỏi kiêng sen trắng cành trúc ? . Theo chuyên_gia văn_hoá , không_gian chùa_chiền , đám hiếu ngược đám_cưới đám_cưới . Tuy_nhiên , , quan_niệm dần . |
Đại_gia TPHCM chi tiền tỷ tổ_chức đám_cưới bảo_mẫu . ( Dân_trí ) - Vì đem nữ bảo_mẫu gia_đình đám_cưới , nữ đại_gia TPHCM chi tiền tổ_chức tiệc cưới bảo_mẫu . |
Đệ phu_nhân Pháp chỉ_trích miệt_thị biểu_tình . Những nổi_tiếng chính_trị_gia cánh tả Pháp bày_tỏ phẫn_nộ Brigitte_Macron cụm_từ lũ đàn_bà ngu_ngốc hoạt_động nữ_quyền . |
Đệ phu_nhân Pháp ám_ảnh đồn giới_tính . Auziere , gái của Đệ nhất phu_nhân Pháp , nói rằng mẹ lo_âu sâu_sắc xoay quanh ̃ng đồn thất thiệt về giới tính . |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
eval_strategy: stepsper_device_train_batch_size: 512per_device_eval_batch_size: 64gradient_accumulation_steps: 4num_train_epochs: 50warmup_ratio: 0.1bf16: Truedataloader_num_workers: 2dataloader_pin_memory: Falsegradient_checkpointing: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 512per_device_eval_batch_size: 64per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 4eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 50max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 2dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Falsedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Truegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | cosine_ndcg@10 |
|---|---|---|---|
| 0.0755 | 1 | 10.5083 | - |
| 0.7547 | 10 | 10.126 | 0.2090 |
| 1.4528 | 20 | 6.8463 | 0.2405 |
| 2.1509 | 30 | 4.8851 | 0.2416 |
| 2.9057 | 40 | 3.3232 | 0.2416 |
| 3.6038 | 50 | 1.6894 | 0.2347 |
| 4.3019 | 60 | 1.128 | 0.2451 |
| 5.0 | 70 | 0.803 | 0.2391 |
| 5.7547 | 80 | 0.8658 | 0.2290 |
| 6.4528 | 90 | 0.6837 | 0.2373 |
| 7.1509 | 100 | 0.6121 | 0.2424 |
| 7.9057 | 110 | 0.5992 | 0.2350 |
| 8.6038 | 120 | 0.5402 | 0.2353 |
| 9.3019 | 130 | 0.4957 | 0.2408 |
| 10.0 | 140 | 0.396 | 0.2426 |
| 10.7547 | 150 | 0.4763 | 0.2425 |
| 11.4528 | 160 | 0.3865 | 0.2370 |
| 12.1509 | 170 | 0.3367 | 0.2342 |
| 12.9057 | 180 | 0.3539 | 0.2374 |
| 13.6038 | 190 | 0.325 | 0.2371 |
| 14.3019 | 200 | 0.2942 | 0.2272 |
| 15.0 | 210 | 0.2531 | 0.2383 |
| 15.7547 | 220 | 0.299 | 0.2188 |
| 16.4528 | 230 | 0.2713 | 0.2439 |
| 17.1509 | 240 | 0.2414 | 0.2293 |
| 17.9057 | 250 | 0.2416 | 0.2373 |
| 18.6038 | 260 | 0.2264 | 0.2486 |
| 19.3019 | 270 | 0.2139 | 0.2228 |
| 20.0 | 280 | 0.1785 | 0.2499 |
| 20.7547 | 290 | 0.2092 | 0.2328 |
| 21.4528 | 300 | 0.1885 | 0.2449 |
| 22.1509 | 310 | 0.161 | 0.2370 |
| 22.9057 | 320 | 0.1744 | 0.2363 |
| 23.6038 | 330 | 0.1617 | 0.2435 |
| 24.3019 | 340 | 0.1642 | 0.2354 |
| 25.0 | 350 | 0.1314 | 0.2389 |
| 25.7547 | 360 | 0.1634 | 0.2337 |
| 26.4528 | 370 | 0.1451 | 0.2388 |
| 27.1509 | 380 | 0.1275 | 0.2300 |
| 27.9057 | 390 | 0.131 | 0.2342 |
| 28.6038 | 400 | 0.1251 | 0.2335 |
| 29.3019 | 410 | 0.1213 | 0.2368 |
| 30.0 | 420 | 0.0996 | 0.2305 |
| 30.7547 | 430 | 0.1239 | 0.2337 |
| 31.4528 | 440 | 0.1043 | 0.2324 |
| 32.1509 | 450 | 0.0975 | 0.2304 |
| 32.9057 | 460 | 0.1004 | 0.2293 |
| 33.6038 | 470 | 0.0967 | 0.2298 |
| 34.3019 | 480 | 0.0933 | 0.2288 |
| 35.0 | 490 | 0.0811 | 0.2273 |
| 35.7547 | 500 | 0.1014 | 0.2279 |
| 36.4528 | 510 | 0.0881 | 0.2272 |
| 37.1509 | 520 | 0.0843 | 0.2260 |
| 37.9057 | 530 | 0.0892 | 0.2260 |
| 38.6038 | 540 | 0.0864 | 0.2253 |
| 39.3019 | 550 | 0.0849 | 0.2251 |
| 40.0 | 560 | 0.0748 | 0.2248 |
| 40.7547 | 570 | 0.0936 | 0.2243 |
| 41.4528 | 580 | 0.0817 | 0.2238 |
| 42.1509 | 590 | 0.0788 | 0.2240 |
| 42.9057 | 600 | 0.0836 | 0.2235 |
| 43.6038 | 610 | 0.0809 | 0.2230 |
| 44.3019 | 620 | 0.0796 | 0.2230 |
| 45.0 | 630 | 0.0706 | 0.2233 |
| 45.7547 | 640 | 0.0881 | 0.2230 |
| 46.4528 | 650 | 0.0768 | 0.2230 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}