Datasets documentation
Overview
Get started
Tutorials
OverviewLoad a dataset from the HubKnow your datasetPreprocessEvaluate predictionsCreate a datasetShare a dataset to the Hub
How-to guides
Overview
General usage
LoadProcessStreamUse with TensorFlowUse with PyTorchUse with JAXUse with SparkCache managementCloud storageSearch indexMetricsBeam DatasetsTroubleshooting
Audio
Vision
Load image dataProcess image dataCreate an image datasetDepth estimationImage classificationSemantic segmentationObject detection
Text
Tabular
Dataset repository
Conceptual guides
Datasets 🤝 ArrowThe cacheDataset or IterableDatasetDataset featuresBuild and loadBatch mappingAll about metrics
Reference
You are viewing v2.17.1 version. A newer version v4.8.4 is available.
Overview
The how-to guides offer a more comprehensive overview of all the tools 🤗 Datasets offers and how to use them. This will help you tackle messier real-world datasets where you may need to manipulate the dataset structure or content to get it ready for training.
The guides assume you are familiar and comfortable with the 🤗 Datasets basics. We recommend newer users check out our tutorials first.
Interested in learning more? Take a look at Chapter 5 of the Hugging Face course!
The guides are organized into six sections:
- General usage: Functions for general dataset loading and processing. The functions shown in this section are applicable across all dataset modalities.
- Audio: How to load, process, and share audio datasets.
- Vision: How to load, process, and share image datasets.
- Text: How to load, process, and share text datasets.
- Tabular: How to load, process, and share tabular datasets.
- Dataset repository: How to share and upload a dataset to the Hub.
If you have any questions about 🤗 Datasets, feel free to join and ask the community on our forum.