Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Croc-Prog-HF 's Collections
LoreWeaver-2 Family
MultiLang-Texts HQ Datasets
Math-HQ-datasets

MultiLang-Texts HQ Datasets

updated 11 days ago
Upvote
-

  • HuggingFaceFW/fineweb-edu

    Viewer • Updated Jul 11, 2025 • 3.5B • 289k • 939

  • uonlp/CulturaX

    Viewer • Updated Dec 16, 2024 • 7.18B • 40.1k • 584

  • yhavinga/mc4_nl_cleaned

    Viewer • Updated Oct 10, 2025 • 165M • 5.98k • 14

  • BramVanroy/CommonCrawl-CreativeCommons-strict

    Viewer • Updated Aug 28, 2025 • 32.8M • 704 • 1

  • BramVanroy/CommonCrawl-CreativeCommons-fine

    Viewer • Updated Aug 28, 2025 • 75.1M • 173 • 2

  • cis-lmu/GlotCC-V1

    Viewer • Updated Nov 1, 2024 • 1.28B • 496 • 56

  • OpenAssistant/oasst1

    Viewer • Updated May 2, 2023 • 88.8k • 10.8k • 1.48k

  • PleIAs/common_corpus

    Viewer • Updated Jun 10, 2025 • 470M • 33.1k • 339

  • CohereLabs/aya_dataset

    Viewer • Updated Apr 15, 2025 • 206k • 3.97k • 334

  • OpenLLM-France/wikipedia

    Viewer • Updated Jan 30, 2025 • 32.5M • 1.4k • 5

  • OpenLLM-France/Claire-Dialogue-French-0.1

    Viewer • Updated Sep 2, 2025 • 37k • 169 • 50

  • HuggingFaceFW/fineweb-2

    Viewer • Updated Oct 27, 2025 • 4.48B • 102k • 745

  • HuggingFaceFW/finepdfs

    Viewer • Updated Jan 9 • 476M • 39.9k • 816

  • HuggingFaceH4/Multilingual-Thinking

    Viewer • Updated Aug 7, 2025 • 1k • 12.8k • 109
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs