📋 Eval Logs Collection Benchmark log generated with Twinkle Eval, recording the model's outputs for each prompt. • 3 items • Updated 5 days ago • 4
🏎️ Formosa-1 Series Collection A collection of Formosa-1 (F1) reasoning models and datasets focused on Traditional Chinese instruction-following and logic. • 4 items • Updated 5 days ago • 4
🧠 Traditional Chinese Reasoning Datasets Collection A curated collection of datasets designed to evaluate and train reasoning capabilities in Traditional Chinese across various domains. • 3 items • Updated 5 days ago • 9