🤗Datasets

Topic	Replies	Views	Activity
Add Convence/ParseEmbed as an official benchmark on the Hub (If possible)	7	83	June 29, 2026
[Concept] Instead of paying for data, we can trade data instead	2	34	June 29, 2026
[SEEKING] Indic Document Dataset (India) — Invoices, Receipts, Utility Bills, Payment Advices, Packing Lists, Commercial Invoices, Credit Notes	5	62	June 25, 2026
Dataset Viewer issue: ConfigNamesError	3	61	June 21, 2026
Follow-up: the detector reliability check, now with a second human rater + two LLMs (fresh scenes)	0	28	June 18, 2026
Welcome — questions, requests, and feedback Board of Veterans’ Appeals decisions 2019-present	3	25	June 10, 2026
Introducing BenSyc v1.1: A Benchmark for Conversational Sycophancy and Alignment in Bengali Social Contexts	0	38	June 9, 2026
New Dataset Released!	0	62	June 8, 2026
Documented our dataset's limits + ran a reliability check on its rule-based labels	0	27	June 8, 2026
Hosting Dataset in Europe due to Ethics Constraints	1	54	June 4, 2026
V7.2 update — Pattern F gap closed + Hard Negatives Batch 2	0	22	May 31, 2026
Objective Projection v7.1: Narrative Engineering Corpus targeting Summarization Bias	0	32	May 30, 2026
For researchers: What physical interaction scenarios are underrepresented in world model training data?	1	38	May 24, 2026
Integration of Benchmark Dataset for CHI-Bench	0	37	May 22, 2026
Legal data creation	1	67	May 16, 2026
[Dataset] CLI-1M: 975K NL→shell pairs — 13 languages, 6 shells, Apache-2.0	0	40	May 14, 2026
Synthetic Australian medical record PDF library (50-doc free sample) - feedback wanted on dataset	0	68	May 7, 2026
PiC/phrase_retrieval dataset (PR-pass & PR-page) is broken — does anyone have a local copy?	0	24	May 5, 2026
Anyone else fighting the “valid json, broken pipeline” problem in planner-executor stacks?	2	66	May 3, 2026
TikTok-10M Dataset	5	934	April 29, 2026
Dino Data Workflow Routing Preview: training models to route, structure, and prepare actions instead of only replying	0	26	April 29, 2026
Built a lane-based dataset bundle explorer for LLM training — would love feedback from the HF community	0	21	April 29, 2026
When Your “Labels” Aren’t Really Labels: Dealing with Entity-Based NLP Datasets	1	44	April 26, 2026
Made a Python failure dataset for DPO/RLHF — how do you source negative examples?	0	45	April 26, 2026
Load_dataset() creates a duplicate in cache	1	76	April 25, 2026
Spanish Historical Web Corpus — unique categories (religion, folklore, conspiracies, BOE)	0	21	April 21, 2026
Dataset viewer broke after repo rename	4	88	April 20, 2026
Huggingface Dataset Download Stuck in Kaggle	7	419	April 14, 2026
Add new official benchmark on the Hub	3	83	April 13, 2026
Otal AI beginner with a 25-year photography archive—is this useful for training?	0	22	April 10, 2026