SentenceTransformer based on Qwen/Qwen3-Embedding-0.6B

This is a sentence-transformers model finetuned from Qwen/Qwen3-Embedding-0.6B on the train dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: Qwen/Qwen3-Embedding-0.6B
Maximum Sequence Length: 256 tokens
Output Dimensionality: 1024 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- train

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'Qwen3Model'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("b00l26/Qwen3-Embedding-0.6B-finetune-news-final")
# Run inference
queries = [
    "M\u1ef9 \u0111i\u1ec1u_tra tu\u00e2n_th\u1ee7 tho\u1ea3_thu\u1eadn th\u01b0\u01a1ng_m\u1ea1i Trung_Qu\u1ed1c . V\u0103n_ph\u00f2ng \u0110\u1ea1i_di\u1ec7n Th\u01b0\u01a1ng_m\u1ea1i M\u1ef9 ( USTR ) \u0111i\u1ec1u_tra tu\u00e2n_th\u1ee7 tho\u1ea3_thu\u1eadn th\u01b0\u01a1ng_m\u1ea1i Trung_Qu\u1ed1c k\u00fd_k\u1ebft 2020 .",
]
documents = [
    "Trung_Quốc đồng_ý đàm_phán thương_mại ' ' Mỹ . Trung_Quốc đồng_ý tiến_hành vòng đàm_phán thương_mại Mỹ   .",
    'Thời_điểm đẹp Hà_Giang ngắm tam_giác mạch ? . Du_khách Hà_Nội ngắm hoa tam_giác mạch Hà_Giang , tư_vấn hoa nở_rộ , rực_rỡ .',
    'Bất_động_sản Nam trở_lại đường_đua sáp_nhập địa_giới hành_chính . ( Dân_trí ) - Trong cung lõi TPHCM khan_hiếm khu_vực lân_cận Bình_Dương , Long_An Đồng_Nai ( cũ ) dồi_dào , đa_dạng phân khúc .',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 1024] [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.6432, -0.1029,  0.0068]])

Evaluation

Metrics

Information Retrieval

Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.0006
cosine_accuracy@3	0.3073
cosine_accuracy@5	0.4208
cosine_accuracy@10	0.5527
cosine_precision@1	0.0006
cosine_precision@3	0.1187
cosine_precision@5	0.1135
cosine_precision@10	0.0879
cosine_recall@1	0.0002
cosine_recall@3	0.1801
cosine_recall@5	0.276
cosine_recall@10	0.4051
cosine_ndcg@10	0.223
cosine_mrr@10	0.1793
cosine_map@100	0.163

Training Details

Training Dataset

train

Dataset: train
Size: 26,696 training samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 21 tokens
mean: 66.23 tokens
max: 119 tokens

min: 26 tokens
mean: 67.17 tokens
max: 124 tokens

	anchor	positive
type	string	string
details	min: 21 tokens mean: 66.23 tokens max: 119 tokens	min: 26 tokens mean: 67.17 tokens max: 124 tokens

Samples:

anchor	positive
`Nam_Định : Thuê thần đèn di_dời 500 đường điện . ( Dân_trí ) - Tiếc 2 tầng xây_dựng khả_năng dỡ , Thuế mua chi nửa tỷ đồng thuê thần đèn di_chuyển đất .`	`Mục_sở_thị mỏng Vương_quốc_Anh rao 1,3 triệu USD . Ngôi nằm ga tàu_điện_ngầm Goldhawk_Road tây London , rộng 6 mét , 5 tầng tổng diện_tích 1.034 mét_vuông . Tuy mỏng , rao giá 950.000 bảng Anh ( 1,3 triệu USD ) .`
`Trang_trí đám_cưới , đám_hỏi kiêng sen trắng cành trúc ? . Theo chuyên_gia văn_hoá , không_gian chùa_chiền , đám hiếu ngược đám_cưới đám_cưới . Tuy_nhiên , , quan_niệm dần .`	`Đại_gia TPHCM chi tiền tỷ tổ_chức đám_cưới bảo_mẫu . ( Dân_trí ) - Vì đem nữ bảo_mẫu gia_đình đám_cưới , nữ đại_gia TPHCM chi tiền tổ_chức tiệc cưới bảo_mẫu .`
`Đệ phu_nhân Pháp chỉ_trích miệt_thị biểu_tình . Những nổi_tiếng chính_trị_gia cánh tả Pháp bày_tỏ phẫn_nộ Brigitte_Macron cụm_từ lũ đàn_bà ngu_ngốc hoạt_động nữ_quyền .`	`Đệ phu_nhân Pháp ám_ảnh đồn giới_tính . Auziere , gái của Đệ nhất phu_nhân Pháp , nói rằng mẹ lo_âu sâu_sắc xoay quanh ̃ng đồn thất thiệt về giới tính .`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "gather_across_devices": false
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 512
per_device_eval_batch_size: 64
gradient_accumulation_steps: 4
num_train_epochs: 50
warmup_ratio: 0.1
bf16: True
dataloader_num_workers: 2
dataloader_pin_memory: False
gradient_checkpointing: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 512
per_device_eval_batch_size: 64
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 4
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 50
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 2
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
tp_size: 0
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: False
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: True
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Epoch	Step	Training Loss	cosine_ndcg@10
0.0755	1	10.5083	-
0.7547	10	10.126	0.2090
1.4528	20	6.8463	0.2405
2.1509	30	4.8851	0.2416
2.9057	40	3.3232	0.2416
3.6038	50	1.6894	0.2347
4.3019	60	1.128	0.2451
5.0	70	0.803	0.2391
5.7547	80	0.8658	0.2290
6.4528	90	0.6837	0.2373
7.1509	100	0.6121	0.2424
7.9057	110	0.5992	0.2350
8.6038	120	0.5402	0.2353
9.3019	130	0.4957	0.2408
10.0	140	0.396	0.2426
10.7547	150	0.4763	0.2425
11.4528	160	0.3865	0.2370
12.1509	170	0.3367	0.2342
12.9057	180	0.3539	0.2374
13.6038	190	0.325	0.2371
14.3019	200	0.2942	0.2272
15.0	210	0.2531	0.2383
15.7547	220	0.299	0.2188
16.4528	230	0.2713	0.2439
17.1509	240	0.2414	0.2293
17.9057	250	0.2416	0.2373
18.6038	260	0.2264	0.2486
19.3019	270	0.2139	0.2228
20.0	280	0.1785	0.2499
20.7547	290	0.2092	0.2328
21.4528	300	0.1885	0.2449
22.1509	310	0.161	0.2370
22.9057	320	0.1744	0.2363
23.6038	330	0.1617	0.2435
24.3019	340	0.1642	0.2354
25.0	350	0.1314	0.2389
25.7547	360	0.1634	0.2337
26.4528	370	0.1451	0.2388
27.1509	380	0.1275	0.2300
27.9057	390	0.131	0.2342
28.6038	400	0.1251	0.2335
29.3019	410	0.1213	0.2368
30.0	420	0.0996	0.2305
30.7547	430	0.1239	0.2337
31.4528	440	0.1043	0.2324
32.1509	450	0.0975	0.2304
32.9057	460	0.1004	0.2293
33.6038	470	0.0967	0.2298
34.3019	480	0.0933	0.2288
35.0	490	0.0811	0.2273
35.7547	500	0.1014	0.2279
36.4528	510	0.0881	0.2272
37.1509	520	0.0843	0.2260
37.9057	530	0.0892	0.2260
38.6038	540	0.0864	0.2253
39.3019	550	0.0849	0.2251
40.0	560	0.0748	0.2248
40.7547	570	0.0936	0.2243
41.4528	580	0.0817	0.2238
42.1509	590	0.0788	0.2240
42.9057	600	0.0836	0.2235
43.6038	610	0.0809	0.2230
44.3019	620	0.0796	0.2230
45.0	630	0.0706	0.2233
45.7547	640	0.0881	0.2230
46.4528	650	0.0768	0.2230

Framework Versions

Python: 3.10.19
Sentence Transformers: 5.2.0
Transformers: 4.51.0
PyTorch: 2.1.2+cu118
Accelerate: 0.34.2
Datasets: 3.3.2
Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}