elchico1990 nielsr HF Staff commited on
Commit
a85df30
·
1 Parent(s): 2fc4641

Add author information and metadata (#1)

Browse files

- Add author information and metadata (4525e82cffaca382408adb8fcf55f9a4addc78cc)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +22 -6
README.md CHANGED
@@ -1,7 +1,8 @@
1
  ---
2
- license: mit
3
  language:
4
  - en
 
 
5
  tags:
6
  - audio
7
  - music
@@ -10,19 +11,34 @@ tags:
10
  - contrastive-learning
11
  - music-information-retrieval
12
  - disentangled-representations
13
- pipeline_tag: feature-extraction
 
 
 
14
  ---
15
 
16
  # MERIT — Disentangled Music Similarity Embeddings
17
  [![arXiv](https://img.shields.io/badge/arXiv-2605.27346-b31b1b.svg)](https://arxiv.org/abs/2605.27346)
18
 
19
- **MERIT** maps audio to three *disentangled* 128-dimensional unit vectors — one each for **melody**, **rhythm**, and **timbre** similarity. A single frozen [MERT-v1-330M](https://huggingface.co/m-a-p/MERT-v1-330M) backbone feeds three small trained projection heads that each specialize in one musical factor.
 
 
20
 
21
  > Code & training pipeline → [github.com/AMAAI-Lab/MERIT](https://github.com/AMAAI-Lab/MERIT)
22
 
23
- - [Paper](https://arxiv.org/abs/2605.27346)
 
 
 
 
 
 
 
 
 
 
24
 
25
- - [Paper TLDR;](https://arxivexplained.com/papers/merit-learning-disentangled-music-representations-for-audio-similarity)
26
 
27
  ---
28
 
@@ -178,4 +194,4 @@ If you use this model, please cite the paper in which it was presented:
178
 
179
  ## License
180
 
181
- [MIT](https://github.com/AMAAI-Lab/MERIT/blob/main/LICENSE)
 
1
  ---
 
2
  language:
3
  - en
4
+ license: mit
5
+ pipeline_tag: feature-extraction
6
  tags:
7
  - audio
8
  - music
 
11
  - contrastive-learning
12
  - music-information-retrieval
13
  - disentangled-representations
14
+ library_name: pytorch
15
+ base_model: m-a-p/MERT-v1-330M
16
+ datasets:
17
+ - amaai-lab/merit
18
  ---
19
 
20
  # MERIT — Disentangled Music Similarity Embeddings
21
  [![arXiv](https://img.shields.io/badge/arXiv-2605.27346-b31b1b.svg)](https://arxiv.org/abs/2605.27346)
22
 
23
+ **MERIT** (Multi-Factor Disentangled Music Similarity) maps audio to three *disentangled* 128-dimensional unit vectors — one each for **melody**, **rhythm**, and **timbre** similarity. A single frozen [MERT-v1-330M](https://huggingface.co/m-a-p/MERT-v1-330M) backbone feeds three small trained projection heads that each specialize in one musical factor.
24
+
25
+ This model was presented in the paper [MERIT: Learning Disentangled Music Representations for Audio Similarity](https://huggingface.co/papers/2605.27346) by Abhinaba Roy, Junyi Liang, and [Dorien Herremans](https://huggingface.co/dorienh).
26
 
27
  > Code & training pipeline → [github.com/AMAAI-Lab/MERIT](https://github.com/AMAAI-Lab/MERIT)
28
 
29
+ ---
30
+
31
+ ## What is MERIT?
32
+
33
+ Given two audio clips, MERIT returns **three independent cosine similarities** — one per musical factor:
34
+
35
+ | Score | Captures | Example query |
36
+ |---|---|---|
37
+ | `S_mel` | Melodic contour & pitch identity | *"Find songs with the same melody"* |
38
+ | `S_rhy` | Rhythmic groove & beat pattern | *"Find songs with the same drum feel"* |
39
+ | `S_tim` | Instrument timbre & sonic character | *"Find songs played on the same instrument"* |
40
 
41
+ A solo piano cover of a rock anthem scores high on `S_mel` but low on `S_rhy` and `S_tim`. MERIT makes this distinction explicit and computable.
42
 
43
  ---
44
 
 
194
 
195
  ## License
196
 
197
+ [MIT](https://github.com/AMAAI-Lab/MERIT/blob/main/LICENSE)