Commit ·
a85df30
1
Parent(s): 2fc4641
Add author information and metadata (#1)
Browse files- Add author information and metadata (4525e82cffaca382408adb8fcf55f9a4addc78cc)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -1,7 +1,8 @@
|
|
| 1 |
---
|
| 2 |
-
license: mit
|
| 3 |
language:
|
| 4 |
- en
|
|
|
|
|
|
|
| 5 |
tags:
|
| 6 |
- audio
|
| 7 |
- music
|
|
@@ -10,19 +11,34 @@ tags:
|
|
| 10 |
- contrastive-learning
|
| 11 |
- music-information-retrieval
|
| 12 |
- disentangled-representations
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
# MERIT — Disentangled Music Similarity Embeddings
|
| 17 |
[](https://arxiv.org/abs/2605.27346)
|
| 18 |
|
| 19 |
-
**MERIT** maps audio to three *disentangled* 128-dimensional unit vectors — one each for **melody**, **rhythm**, and **timbre** similarity. A single frozen [MERT-v1-330M](https://huggingface.co/m-a-p/MERT-v1-330M) backbone feeds three small trained projection heads that each specialize in one musical factor.
|
|
|
|
|
|
|
| 20 |
|
| 21 |
> Code & training pipeline → [github.com/AMAAI-Lab/MERIT](https://github.com/AMAAI-Lab/MERIT)
|
| 22 |
|
| 23 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
-
|
| 26 |
|
| 27 |
---
|
| 28 |
|
|
@@ -178,4 +194,4 @@ If you use this model, please cite the paper in which it was presented:
|
|
| 178 |
|
| 179 |
## License
|
| 180 |
|
| 181 |
-
[MIT](https://github.com/AMAAI-Lab/MERIT/blob/main/LICENSE)
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
+
license: mit
|
| 5 |
+
pipeline_tag: feature-extraction
|
| 6 |
tags:
|
| 7 |
- audio
|
| 8 |
- music
|
|
|
|
| 11 |
- contrastive-learning
|
| 12 |
- music-information-retrieval
|
| 13 |
- disentangled-representations
|
| 14 |
+
library_name: pytorch
|
| 15 |
+
base_model: m-a-p/MERT-v1-330M
|
| 16 |
+
datasets:
|
| 17 |
+
- amaai-lab/merit
|
| 18 |
---
|
| 19 |
|
| 20 |
# MERIT — Disentangled Music Similarity Embeddings
|
| 21 |
[](https://arxiv.org/abs/2605.27346)
|
| 22 |
|
| 23 |
+
**MERIT** (Multi-Factor Disentangled Music Similarity) maps audio to three *disentangled* 128-dimensional unit vectors — one each for **melody**, **rhythm**, and **timbre** similarity. A single frozen [MERT-v1-330M](https://huggingface.co/m-a-p/MERT-v1-330M) backbone feeds three small trained projection heads that each specialize in one musical factor.
|
| 24 |
+
|
| 25 |
+
This model was presented in the paper [MERIT: Learning Disentangled Music Representations for Audio Similarity](https://huggingface.co/papers/2605.27346) by Abhinaba Roy, Junyi Liang, and [Dorien Herremans](https://huggingface.co/dorienh).
|
| 26 |
|
| 27 |
> Code & training pipeline → [github.com/AMAAI-Lab/MERIT](https://github.com/AMAAI-Lab/MERIT)
|
| 28 |
|
| 29 |
+
---
|
| 30 |
+
|
| 31 |
+
## What is MERIT?
|
| 32 |
+
|
| 33 |
+
Given two audio clips, MERIT returns **three independent cosine similarities** — one per musical factor:
|
| 34 |
+
|
| 35 |
+
| Score | Captures | Example query |
|
| 36 |
+
|---|---|---|
|
| 37 |
+
| `S_mel` | Melodic contour & pitch identity | *"Find songs with the same melody"* |
|
| 38 |
+
| `S_rhy` | Rhythmic groove & beat pattern | *"Find songs with the same drum feel"* |
|
| 39 |
+
| `S_tim` | Instrument timbre & sonic character | *"Find songs played on the same instrument"* |
|
| 40 |
|
| 41 |
+
A solo piano cover of a rock anthem scores high on `S_mel` but low on `S_rhy` and `S_tim`. MERIT makes this distinction explicit and computable.
|
| 42 |
|
| 43 |
---
|
| 44 |
|
|
|
|
| 194 |
|
| 195 |
## License
|
| 196 |
|
| 197 |
+
[MIT](https://github.com/AMAAI-Lab/MERIT/blob/main/LICENSE)
|