amaai-lab
/

merit

Feature Extraction

PyTorch

English

music-information-retrieval

disentangled-representations

Model card Files Files and versions

xet

Community

elchico1990

nielsr HF Staff commited on 3 days ago

Commit

a85df30

1 Parent(s): 2fc4641

Add author information and metadata (#1)

Browse files

- Add author information and metadata (4525e82cffaca382408adb8fcf55f9a4addc78cc)

Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +22 -6

README.md CHANGED Viewed

@@ -1,7 +1,8 @@
 ---
-license: mit
 language:
 - en
 tags:
 - audio
 - music
@@ -10,19 +11,34 @@ tags:
 - contrastive-learning
 - music-information-retrieval
 - disentangled-representations
-pipeline_tag: feature-extraction
 ---
 # MERIT — Disentangled Music Similarity Embeddings
 [![arXiv](https://img.shields.io/badge/arXiv-2605.27346-b31b1b.svg)](https://arxiv.org/abs/2605.27346)
-**MERIT** maps audio to three *disentangled* 128-dimensional unit vectors — one each for **melody**, **rhythm**, and **timbre** similarity. A single frozen [MERT-v1-330M](https://huggingface.co/m-a-p/MERT-v1-330M) backbone feeds three small trained projection heads that each specialize in one musical factor.
 > Code & training pipeline → [github.com/AMAAI-Lab/MERIT](https://github.com/AMAAI-Lab/MERIT)
-- [Paper](https://arxiv.org/abs/2605.27346)
-- [Paper TLDR;](https://arxivexplained.com/papers/merit-learning-disentangled-music-representations-for-audio-similarity)
 ---
@@ -178,4 +194,4 @@ If you use this model, please cite the paper in which it was presented:
 ## License
-[MIT](https://github.com/AMAAI-Lab/MERIT/blob/main/LICENSE)

 ---
 language:
 - en
+license: mit
+pipeline_tag: feature-extraction
 tags:
 - audio
 - music
 - contrastive-learning
 - music-information-retrieval
 - disentangled-representations
+library_name: pytorch
+base_model: m-a-p/MERT-v1-330M
+datasets:
+- amaai-lab/merit
 ---
 # MERIT — Disentangled Music Similarity Embeddings
 [![arXiv](https://img.shields.io/badge/arXiv-2605.27346-b31b1b.svg)](https://arxiv.org/abs/2605.27346)
+**MERIT** (Multi-Factor Disentangled Music Similarity) maps audio to three *disentangled* 128-dimensional unit vectors — one each for **melody**, **rhythm**, and **timbre** similarity. A single frozen [MERT-v1-330M](https://huggingface.co/m-a-p/MERT-v1-330M) backbone feeds three small trained projection heads that each specialize in one musical factor.
+This model was presented in the paper [MERIT: Learning Disentangled Music Representations for Audio Similarity](https://huggingface.co/papers/2605.27346) by Abhinaba Roy, Junyi Liang, and [Dorien Herremans](https://huggingface.co/dorienh).
 > Code & training pipeline → [github.com/AMAAI-Lab/MERIT](https://github.com/AMAAI-Lab/MERIT)
+---
+## What is MERIT?
+Given two audio clips, MERIT returns **three independent cosine similarities** — one per musical factor:
+| Score | Captures | Example query |
+|---|---|---|
+| `S_mel` | Melodic contour & pitch identity | *"Find songs with the same melody"* |
+| `S_rhy` | Rhythmic groove & beat pattern | *"Find songs with the same drum feel"* |
+| `S_tim` | Instrument timbre & sonic character | *"Find songs played on the same instrument"* |
+A solo piano cover of a rock anthem scores high on `S_mel` but low on `S_rhy` and `S_tim`. MERIT makes this distinction explicit and computable.
 ---
 ## License
+[MIT](https://github.com/AMAAI-Lab/MERIT/blob/main/LICENSE)