Deep Learning Research Datasets

List of Datasets for Artificial Intelligence, Data Science, Deep Learning and Machine Learning Projects.

This list is created for the research and experimentation of  compassionate AI Lab. 

  • UCI Machine Learning Repository, maintains 436 data sets as a service to the machine learning community.
  • MNIST – MNIST contains images for handwritten digit classification. It’s considered a great entry dataset for deep learning because it’s complex enough to warrant neural networks, while still being manageable on a single CPU. (We also have a tutorial.)
  • CIFAR – The next step up in difficulty is the CIFAR-10 dataset, which contains 60,000 images broken into 10 different classes. For a bigger challenge, you can try the CIFAR-100 dataset, which has 100 different classes.
  • ImageNet – ImageNet hosts a computer vision competition every year, and many consider it to be the benchmark for modern performance. The current image dataset has 1000 different classes.
  • YouTube 8M – Ready to tackle videos, but can’t spare terabytes of storage? This dataset contains millions of YouTube video ID’s and billions of audio and visual features that were pre-extracted using the latest deep learning models.
  • Kaggle Datasets – Open datasets contributed by the Kaggle community. Here, you’ll find a grab bag of topics. Plus, you can learn from the short tutorials and scripts that accompany the datasets.
  • r/datasets – Open datasets contributed by the Reddit community. This is another source of interesting and quirky datasets, but the datasets tend to less refined.
