List of Datasets for Artificial Intelligence, Data Science, Deep Learning and Machine Learning Projects.
This list is created for the research and experimentation of compassionate AI Lab.
- UCI Machine Learning Repository, maintains 436 data sets as a service to the machine learning community.
- MNIST – MNIST contains images for handwritten digit classification. It’s considered a great entry dataset for deep learning because it’s complex enough to warrant neural networks, while still being manageable on a single CPU. (We also have a tutorial.)
- CIFAR – The next step up in difficulty is the CIFAR-10 dataset, which contains 60,000 images broken into 10 different classes. For a bigger challenge, you can try the CIFAR-100 dataset, which has 100 different classes.
- ImageNet – ImageNet hosts a computer vision competition every year, and many consider it to be the benchmark for modern performance. The current image dataset has 1000 different classes.
- YouTube 8M – Ready to tackle videos, but can’t spare terabytes of storage? This dataset contains millions of YouTube video ID’s and billions of audio and visual features that were pre-extracted using the latest deep learning models.
- Kaggle Datasets – Open datasets contributed by the Kaggle community. Here, you’ll find a grab bag of topics. Plus, you can learn from the short tutorials and scripts that accompany the datasets.
- r/datasets – Open datasets contributed by the Reddit community. This is another source of interesting and quirky datasets, but the datasets tend to less refined.