Websites to Cure Your Dataset Needs

Steven Kyle
3 min readOct 3, 2021
Photo by Mika Baumeister on Unsplash

Coming up with Data Science projects can be extremely difficult. Sometimes it’s easier to just grab a dataset and see what’s in it and what we can do with it. Luckily for us data is everywhere, all we have to know is where to look. In this blog post I will introduce 5 websites that have a significant amount of datasets to chose from.

1. Kaggle

The first website I will introduce is Kaggle. Kaggle is well known for hosting large Data Science Competitions as well as having a large aggregated collection of datasets. Kaggle has been online for 11 years now and has accumulated a wide range of fun and useful datasets to look at. They currently have OVER 54,000 datasets. While taking a look at the datasets make sure to take a look at any competitions currently happening, they are free to enter and you might win.

2. Data.gov

The second website that will be introduced is Data.gov. This website contains all public data the U.S. decided to make public. There is currently 318,997 datasets available through the site. The website is very user friendly and makes it easy to navigate through the vast amount of datasets. The datasets are very organized and clearly tagged with the respective organization (Federal, State, City…etc.). This website covers a wide range of categories since it contains all public U.S. governments datasets.

3. Google Dataset Search

The third website is Googles Dataset Search. Google is a hard company to not talk about when speaking of aggregating datasets. Google released their Dataset Search engine in 2018. Like the google search engine we all know and love, this search engine works the same except it is strictly for datasets. Type in any keyword or subject and the search engine will find datasets that match. If you have a hard time finding a specific dataset, a useful blog post that can help fine tune your search can be found here.

4. FBI CDE

The fourth website to be introduced is FBI’s Crime Data Explorer. If you’re a fan of true crime or crime data in general this is the goldmine for you. FBI’s Crime Data Explorer has a plethora of data, they also have an API that you can use. The datasets cover both violent crimes and property crimes. The website also displays visuals that show the features of the datasets so that you can check the data before downloading.

5. Open ML

The last website is OpenML. OpenML is an open science online platform for machine learning. They currently have 21,417 datasets that can be looked through. Not only do they have datasets, but they also have open algorithms and tasks. This is a great free website that you can find datasets and can also learn by completing tasks and flows. If you are interested in OpenML and learning about tasks and flows, I suggest reading the getting started guide that can be found here.

--

--

Steven Kyle

25 year old Texan in the midst of a career change into DataScience.