Update README.md
Browse files
README.md
CHANGED
|
@@ -4,7 +4,7 @@ library_name: transformers
|
|
| 4 |
|
| 5 |
# tiny-mimo-v2-flash
|
| 6 |
|
| 7 |
-
A ~
|
| 8 |
|
| 9 |
## Configuration
|
| 10 |
|
|
@@ -21,5 +21,5 @@ A ~1.85B-parameter tiny random-weight checkpoint of [XiaomiMiMo/MiMo-V2-Flash](h
|
|
| 21 |
| `num_attention_heads` / `num_key_value_heads` | 16 / 1 | 64 / 4 (**ratio 4.0**) |
|
| 22 |
| `head_dim` / `v_head_dim` | 192 / 128 | 192 / 128 |
|
| 23 |
| `n_routed_experts` / `num_experts_per_tok` | 64 / 2 | 256 / 8 (**ratio 4.0**) |
|
| 24 |
-
| parameters |
|
| 25 |
|
|
|
|
| 4 |
|
| 5 |
# tiny-mimo-v2-flash
|
| 6 |
|
| 7 |
+
A ~2.34B-parameter tiny random-weight checkpoint of [XiaomiMiMo/MiMo-V2-Flash](https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash), used for internal testing in Hugging Face `transformers` for the native HF implementation.
|
| 8 |
|
| 9 |
## Configuration
|
| 10 |
|
|
|
|
| 21 |
| `num_attention_heads` / `num_key_value_heads` | 16 / 1 | 64 / 4 (**ratio 4.0**) |
|
| 22 |
| `head_dim` / `v_head_dim` | 192 / 128 | 192 / 128 |
|
| 23 |
| `n_routed_experts` / `num_experts_per_tok` | 64 / 2 | 256 / 8 (**ratio 4.0**) |
|
| 24 |
+
| parameters | 2.34B | 300B |
|
| 25 |
|