Update README.md
Browse files
README.md
CHANGED
|
@@ -250,7 +250,7 @@ The model is based on the Mamba architecture ([Gu et al., 2023](https://arxiv.or
|
|
| 250 |
| `d_model` | 4096 | Hidden dimension |
|
| 251 |
| `d_state` | 16 | The SSM state dimension |
|
| 252 |
| Vocabulary | 65024 | Vocabulary Size |
|
| 253 |
-
| Sequence length | 8192 | During
|
| 254 |
|
| 255 |
## Compute Infrastructure
|
| 256 |
|
|
|
|
| 250 |
| `d_model` | 4096 | Hidden dimension |
|
| 251 |
| `d_state` | 16 | The SSM state dimension |
|
| 252 |
| Vocabulary | 65024 | Vocabulary Size |
|
| 253 |
+
| Sequence length | 8192 | During the last training stages |
|
| 254 |
|
| 255 |
## Compute Infrastructure
|
| 256 |
|