Safetensors
Basque
openelm
custom_code
GorkaUrbizu commited on
Commit
4cdf979
·
verified ·
1 Parent(s): bcf016c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -2
README.md CHANGED
@@ -2,9 +2,41 @@
2
  license: apple-ascl
3
  datasets:
4
  - orai-nlp/ZelaiHandi
5
- - HuggingFaceFW/fineweb
6
  language:
7
  - eu
8
  base_model:
9
  - apple/OpenELM-270M
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apple-ascl
3
  datasets:
4
  - orai-nlp/ZelaiHandi
 
5
  language:
6
  - eu
7
  base_model:
8
  - apple/OpenELM-270M
9
+ ---
10
+
11
+ # OpenELM-270M-eu continual
12
+
13
+ OpenELM 270M for Basque continually pretrained on [ZelaHandi-v1](https://huggingface.co/datasets/orai-nlp/ZelaiHandi) for 5 epochs.
14
+
15
+ 📝 Paper: [Sub-1B Language Models for Low-Resource Languages: Training Strategies and Insights for Basque](https://aclanthology.org/2025.mrl-main.35/) accepted in the [5TH MULTILINGUAL REPRESENTATION LEARNING (MRL) WORKSHOP 2025](https://sigtyp.github.io/ws2025-mrl.html) (EMNLP)
16
+
17
+
18
+ ## Acknowledgments
19
+
20
+ The creation of this dataset has been partially funded by the Basque Government (ICL4LANG project, grant no. KK-2023/00094) and the European Union (EFA 104/01-LINGUATEC IA project, INTERREG POCTEFA 2021-2027 program).
21
+ Pre-training and fine-tuning of SLMs were conducted using the Hyperion system at the Donostia International Physics Center (DIPC).
22
+ Finally, we thank Idoia Davila Uzkudun for her contributions to manual data curation and evaluation.
23
+
24
+ ## Citation
25
+
26
+ If you use this dataset please cite the following paper:
27
+
28
+ ```bibtex
29
+ @inproceedings{urbizu2025sub,
30
+ title={Sub-1B Language Models for Low-Resource Languages: Training Strategies and Insights for {B}asque},
31
+ author={Urbizu, Gorka and Corral, Ander and Saralegi, Xabier and San Vicente, I{\~n}aki},
32
+ booktitle={Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)},
33
+ pages={519--530},
34
+ year={2025}
35
+ }
36
+
37
+ ```
38
+
39
+ ## Contact
40
+
41
+ - Gorka Urbizu ([email protected])
42
+ - Xabier Saralegi ([email protected])