MongoDB
/

mdbr-leaf-mt

@@ -47,7 +47,7 @@ A technical report detailing our proposed `LEAF` training procedure is [availabl
 * **State-of-the-Art Performance**: `mdbr-leaf-mt` achieves new state-of-the-art results for compact embedding models, **ranking #1** on the [public MTEB v2 (Eng) benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for models with ≤30M parameters.
 * **Flexible Architecture Support**: `mdbr-leaf-mt` supports asymmetric retrieval architectures enabling even greater retrieval results. [See below](#asymmetric-retrieval-setup) for more information.
-* **MRL and Quantization Support**: embedding vectors generated by `mdbr-leaf-mt` compress well when truncated (MRL) and can be stored using more efficient types like `int8` and `binary`.  [See below](#mrl) for more information.
 ## Benchmark Comparison
@@ -114,8 +114,11 @@ for i, query in enumerate(queries):
 See [here](https://huggingface.co/MongoDB/mdbr-leaf-mt/blob/main/transformers_example_mt.ipynb).
-## Asymmetric Retrieval Setup
 `mdbr-leaf-mt` is *aligned* to [`mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1), the model it has been distilled from, making the asymmetric system below possible:
 ```python
@@ -136,25 +139,19 @@ Retrieval results from asymmetric mode are usually superior to the [standard mod
 Embeddings have been trained via [MRL](https://arxiv.org/abs/2205.13147) and can be truncated for more efficient storage:
 ```python
-from torch.nn import functional as F
-query_embeds = model.encode(queries, prompt_name="query", convert_to_tensor=True)
-doc_embeds = model.encode(documents, convert_to_tensor=True)
-# Truncate and normalize according to MRL
-query_embeds = F.normalize(query_embeds[:, :256], dim=-1)
-doc_embeds = F.normalize(doc_embeds[:, :256], dim=-1)
 similarities = model.similarity(query_embeds, doc_embeds)
 print('After MRL:')
 print(f"* Embeddings dimension: {query_embeds.shape[1]}")
-print(f"* Similarities:\n\t{similarities}")
 # After MRL:
 # * Embeddings dimension: 256
 # * Similarities:
-# 	  tensor([[0.9164, 0.7219],
 #             [0.6682, 0.8393]], device='cuda:0')
 ```
@@ -180,7 +177,7 @@ similarities = query_embeds.astype(int) @ doc_embeds.astype(int).T
 print('After quantization:')
 print(f"* Embeddings type: {query_embeds.dtype}")
-print(f"* Similarities:\n{similarities}")
 # After quantization:
 # * Embeddings type: int8

 * **State-of-the-Art Performance**: `mdbr-leaf-mt` achieves new state-of-the-art results for compact embedding models, **ranking #1** on the [public MTEB v2 (Eng) benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for models with ≤30M parameters.
 * **Flexible Architecture Support**: `mdbr-leaf-mt` supports asymmetric retrieval architectures enabling even greater retrieval results. [See below](#asymmetric-retrieval-setup) for more information.
+* **MRL and Quantization Support**: embedding vectors generated by `mdbr-leaf-mt` compress well when truncated (MRL) and can be stored using more efficient types like `int8` and `binary`. [See below](#mrl-truncation) for more information.
 ## Benchmark Comparison
 See [here](https://huggingface.co/MongoDB/mdbr-leaf-mt/blob/main/transformers_example_mt.ipynb).
+## Asymmetric Retrieval Setup
+> [!Note]
+> **Note**: a version of this asymmetric setup, conveniently packaged into a single model, is [available here](https://huggingface.co/MongoDB/mdbr-leaf-mt-asym).
 `mdbr-leaf-mt` is *aligned* to [`mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1), the model it has been distilled from, making the asymmetric system below possible:
 ```python
 Embeddings have been trained via [MRL](https://arxiv.org/abs/2205.13147) and can be truncated for more efficient storage:
 ```python
+query_embeds = model.encode(queries, prompt_name="query", truncate_dim=256)
+doc_embeds = model.encode(documents, truncate_dim=256)
 similarities = model.similarity(query_embeds, doc_embeds)
 print('After MRL:')
 print(f"* Embeddings dimension: {query_embeds.shape[1]}")
+print(f"* Similarities: \n\t{similarities}")
 # After MRL:
 # * Embeddings dimension: 256
 # * Similarities:
+#     tensor([[0.9164, 0.7219],
 #             [0.6682, 0.8393]], device='cuda:0')
 ```
 print('After quantization:')
 print(f"* Embeddings type: {query_embeds.dtype}")
+print(f"* Similarities: \n{similarities}")
 # After quantization:
 # * Embeddings type: int8