SentenceTransformer based on Qwen/Qwen3-Embedding-0.6B

This is a sentence-transformers model finetuned from Qwen/Qwen3-Embedding-0.6B on the train dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Qwen/Qwen3-Embedding-0.6B
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • train

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'Qwen3Model'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("b00l26/Qwen3-Embedding-0.6B-finetune-news-final")
# Run inference
queries = [
    "M\u1ef9 \u0111i\u1ec1u_tra tu\u00e2n_th\u1ee7 tho\u1ea3_thu\u1eadn th\u01b0\u01a1ng_m\u1ea1i Trung_Qu\u1ed1c . V\u0103n_ph\u00f2ng \u0110\u1ea1i_di\u1ec7n Th\u01b0\u01a1ng_m\u1ea1i M\u1ef9 ( USTR ) \u0111i\u1ec1u_tra tu\u00e2n_th\u1ee7 tho\u1ea3_thu\u1eadn th\u01b0\u01a1ng_m\u1ea1i Trung_Qu\u1ed1c k\u00fd_k\u1ebft 2020 .",
]
documents = [
    "Trung_Quốc đồng_ý đàm_phán thương_mại ' ' Mỹ . Trung_Quốc đồng_ý tiến_hành vòng đàm_phán thương_mại Mỹ   .",
    'Thời_điểm đẹp Hà_Giang ngắm tam_giác mạch ? . Du_khách Hà_Nội ngắm hoa tam_giác mạch Hà_Giang , tư_vấn hoa nở_rộ , rực_rỡ .',
    'Bất_động_sản Nam trở_lại đường_đua sáp_nhập địa_giới hành_chính . ( Dân_trí ) - Trong cung lõi TPHCM khan_hiếm khu_vực lân_cận Bình_Dương , Long_An Đồng_Nai ( cũ ) dồi_dào , đa_dạng phân khúc .',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 1024] [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.6432, -0.1029,  0.0068]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.0006
cosine_accuracy@3 0.3073
cosine_accuracy@5 0.4208
cosine_accuracy@10 0.5527
cosine_precision@1 0.0006
cosine_precision@3 0.1187
cosine_precision@5 0.1135
cosine_precision@10 0.0879
cosine_recall@1 0.0002
cosine_recall@3 0.1801
cosine_recall@5 0.276
cosine_recall@10 0.4051
cosine_ndcg@10 0.223
cosine_mrr@10 0.1793
cosine_map@100 0.163

Training Details

Training Dataset

train

  • Dataset: train
  • Size: 26,696 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 21 tokens
    • mean: 66.23 tokens
    • max: 119 tokens
    • min: 26 tokens
    • mean: 67.17 tokens
    • max: 124 tokens
  • Samples:
    anchor positive
    Nam_Định : Thuê thần đèn di_dời 500 đường điện . ( Dân_trí ) - Tiếc 2 tầng xây_dựng khả_năng dỡ , Thuế mua chi nửa tỷ đồng thuê thần đèn di_chuyển đất . Mục_sở_thị mỏng Vương_quốc_Anh rao 1,3 triệu USD . Ngôi nằm ga tàu_điện_ngầm Goldhawk_Road tây London , rộng 6 mét , 5 tầng tổng diện_tích 1.034 mét_vuông . Tuy mỏng , rao giá 950.000 bảng Anh ( 1,3 triệu USD ) .
    Trang_trí đám_cưới , đám_hỏi kiêng sen trắng cành trúc ? . Theo chuyên_gia văn_hoá , không_gian chùa_chiền , đám hiếu ngược đám_cưới đám_cưới . Tuy_nhiên , , quan_niệm dần . Đại_gia TPHCM chi tiền tỷ tổ_chức đám_cưới bảo_mẫu . ( Dân_trí ) - Vì đem nữ bảo_mẫu gia_đình đám_cưới , nữ đại_gia TPHCM chi tiền tổ_chức tiệc cưới bảo_mẫu .
    Đệ phu_nhân Pháp chỉ_trích miệt_thị biểu_tình . Những nổi_tiếng chính_trị_gia cánh tả Pháp bày_tỏ phẫn_nộ Brigitte_Macron cụm_từ lũ đàn_bà ngu_ngốc hoạt_động nữ_quyền . Đệ phu_nhân Pháp ám_ảnh đồn giới_tính . Auziere , gái của Đệ nhất phu_nhân Pháp , nói rằng mẹ lo_âu sâu_sắc xoay quanh ̃ng đồn thất thiệt về giới tính .
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 512
  • per_device_eval_batch_size: 64
  • gradient_accumulation_steps: 4
  • num_train_epochs: 50
  • warmup_ratio: 0.1
  • bf16: True
  • dataloader_num_workers: 2
  • dataloader_pin_memory: False
  • gradient_checkpointing: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 512
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 4
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 50
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 2
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: False
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss cosine_ndcg@10
0.0755 1 10.5083 -
0.7547 10 10.126 0.2090
1.4528 20 6.8463 0.2405
2.1509 30 4.8851 0.2416
2.9057 40 3.3232 0.2416
3.6038 50 1.6894 0.2347
4.3019 60 1.128 0.2451
5.0 70 0.803 0.2391
5.7547 80 0.8658 0.2290
6.4528 90 0.6837 0.2373
7.1509 100 0.6121 0.2424
7.9057 110 0.5992 0.2350
8.6038 120 0.5402 0.2353
9.3019 130 0.4957 0.2408
10.0 140 0.396 0.2426
10.7547 150 0.4763 0.2425
11.4528 160 0.3865 0.2370
12.1509 170 0.3367 0.2342
12.9057 180 0.3539 0.2374
13.6038 190 0.325 0.2371
14.3019 200 0.2942 0.2272
15.0 210 0.2531 0.2383
15.7547 220 0.299 0.2188
16.4528 230 0.2713 0.2439
17.1509 240 0.2414 0.2293
17.9057 250 0.2416 0.2373
18.6038 260 0.2264 0.2486
19.3019 270 0.2139 0.2228
20.0 280 0.1785 0.2499
20.7547 290 0.2092 0.2328
21.4528 300 0.1885 0.2449
22.1509 310 0.161 0.2370
22.9057 320 0.1744 0.2363
23.6038 330 0.1617 0.2435
24.3019 340 0.1642 0.2354
25.0 350 0.1314 0.2389
25.7547 360 0.1634 0.2337
26.4528 370 0.1451 0.2388
27.1509 380 0.1275 0.2300
27.9057 390 0.131 0.2342
28.6038 400 0.1251 0.2335
29.3019 410 0.1213 0.2368
30.0 420 0.0996 0.2305
30.7547 430 0.1239 0.2337
31.4528 440 0.1043 0.2324
32.1509 450 0.0975 0.2304
32.9057 460 0.1004 0.2293
33.6038 470 0.0967 0.2298
34.3019 480 0.0933 0.2288
35.0 490 0.0811 0.2273
35.7547 500 0.1014 0.2279
36.4528 510 0.0881 0.2272
37.1509 520 0.0843 0.2260
37.9057 530 0.0892 0.2260
38.6038 540 0.0864 0.2253
39.3019 550 0.0849 0.2251
40.0 560 0.0748 0.2248
40.7547 570 0.0936 0.2243
41.4528 580 0.0817 0.2238
42.1509 590 0.0788 0.2240
42.9057 600 0.0836 0.2235
43.6038 610 0.0809 0.2230
44.3019 620 0.0796 0.2230
45.0 630 0.0706 0.2233
45.7547 640 0.0881 0.2230
46.4528 650 0.0768 0.2230

Framework Versions

  • Python: 3.10.19
  • Sentence Transformers: 5.2.0
  • Transformers: 4.51.0
  • PyTorch: 2.1.2+cu118
  • Accelerate: 0.34.2
  • Datasets: 3.3.2
  • Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
-
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for b00l26/Qwen3-Embedding-0.6B-finetune-news-final

Finetuned
(131)
this model

Papers for b00l26/Qwen3-Embedding-0.6B-finetune-news-final

Evaluation results