Wasm: A Pipeline for Constructing Structured Arabic Interleaved Multimodal Corpora Paper • 2511.07080 • Published Nov 10 • 31
Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR Paper • 2509.18174 • Published Sep 17 • 128
Misraj Open Data Collection This collection contain an open source data has been collected and processed by Misraj team • 3 items • Updated Jul 7 • 6
Mutarjim: Advancing Bidirectional Arabic-English Translation with a Small Language Model Paper • 2505.17894 • Published May 23 • 220
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch +5 May 21 • 237
view article Article How to generate text: using different decoding methods for language generation with Transformers Mar 1, 2020 • 271
Sadeed: Advancing Arabic Diacritization Through Small Language Model Paper • 2504.21635 • Published Apr 30 • 59