The rise of open data has been critical to improving access to data analysts working on their own projects, government officials crafting policy, and academics conducting cutting-edge research across a vast array of fields.
Because anyone with a computer and some programming skills can download and access these high quality datasets, open data represents a radical shift towards the democratization of knowledge.
In this article, we’ll explain what differentiates open data from other data sources, why it’s important to consider using it in your personal or work projects, and take you through 15 high-quality and well-regarded open data sources you can explore when you need some inspiration!
If you’d like to start working with open data right away, why not try this free 5-day data analytics course for beginners.
Here’s what we’ll cover:
1.What is open data?
Open data simply means that the data can be used by anyone for any purpose. This allows anyone to transform, augment, share, and build both non-commercial and commercial applications on it.
Open data emerged alongside a broader drive in tech towards open source software and hardware. Many companies, academic institutions, think tanks, non-profits, and individual researchers have come together to share their data freely.
2. Why are open data sources important?
It’s important to use data that you have the right to use, and publish, especially if you are making your work public, or creating something for commercial use. Whether you’re writing for a business, academic, or non-expert audience, your readers will be interested in knowing where your data originated from, or how the data in a dataset was collected and obtained.
Most proprietary datasets prohibit the use of data for commercial purposes, which means you can’t use them if you’re looking to sell something based on that data, without obtaining express permission from the creator. As it can take quite a long time to obtain permission, it’s almost always better to go with one of the many open datasets available online.
In fact, there has been a push in recent years for governments and non-profit entities to publish their datasets online to increase transparency and accountability. In the U.S., for example, the OPEN Government Data Act was enacted to encourage more evidence-based policy making.
3. The best free open data sources
Open data sources: Journalism and research
1. FiveThirtyEight
FiveThirtyEight is a news site that is well known for their memorable visualizations with their signature style and formatting.
They have published some of their data, and code, that go into their graphics. These are hosted on Github, and are an ideal dataset for beginners to work with as they have been cleaned for easier analysis.
Their datasets range from sports (NFL Predictions), politics (political donations), to culture (the Bechdel test applied to movies).
2. The New York Times
As one of the most popular news sites in the world, the New York Times needs no introduction. On their developer portal, they make it easy for you to work with one of their ten APIs, which let you access article metadata, best sellers lists, top stories, and more. Data is returned as JSON files, so you’ll need to have a decent grasp of programming fundamentals before trying this out.
3. The Pew Research Center
The Pew Research Center is a well-regarded think tank that regularly runs public opinion polls among other research functions that are primarily data driven and use rigorous methodological standards. They work on a broad range of topics and often go beyond American analysis. For example, they conduct cross-national studies through the international Global Attitudes survey, and they created Data Labs to establish new ways of obtaining data to improve their current collection.
Open data sources: Government
4. The U.S. Government
The U.S. Government has published over 335,221 datasets which you can filter by format, geospatial boundaries, categories, and organizations. The datasets available here span a broad range of categories: agriculture, climate, energy, local government, maritime, ocean, and older adults health. They are currently highlighting a dataset on rivers included in the Inland Electronic Navigation Chart (IENC) program, which covers thousands of miles of navigable waterways.
5. Ontario
The Canadian province of Ontario wants data to be “open by default”; this means you have access to a rich source of more than 2,700 listed datasets across categories like justice and public safety, environment and natural resources, and infrastructure and transportation. Although not all of them are ready for public access yet, it’s worth bookmarking this tab to keep an eye on when new datasets get released.
6. India’s Open Government
India’s Open Government Data Portal contains 4,738 items in its catalog of datasets. You can explore datasets by sector (Census, Water and Sanitation, Finance, Animal Husbandry), groups, state, or API. If you’re not sure where to get started, the homepage offers some useful highlights that can inspire your next project. Under the visualization carousell, you can take a look at the most viewed visualizations. Or, you can check out what the “high value dataset” currently is.
7. Singaporean Open Datasets
The Singaporean open dataset homepage looks like a dashboard because it is partially one: you can examine visualizations under “Singapore at a glance” to look at national statistics, which might give you an idea for your project. More advanced analysts will appreciate their developer resources page which explains how you can get access to one of their fourteen real-time datasets, including taxi availability, ultraviolet index, the weather forecast, and the pollutant standards index.
8. City of London
The City of London in the United Kingdom has published 1,101 datasets ranging from sport, to planning, and to art and culture. These can be downloaded in a wide range of formats, and can be filtered by the level of geographical boundary (e.g. local authority, borough, or ward) and source publisher. A particularly interesting dataset tracks daily Reservoir levels in London from 1989 to present day.
Open data sources: Science and technology
9. Open Science Data Cloud
If you’re interested in using the same data that researchers work with across fields and disciplines, head on over to the Open Science Data Cloud. This platform enables the scientific community to share their extremely large datasets–think terabyte and petabyte-size, which requires more advanced programming knowledge of how to handle and train big datasets.
10. NASA
NASA publishes its datasets from its science missions; you can check out the handy visualization here for an overview of what you can access, including everything from national geospatial data assets, to ocean chemistry, to snowmelt timing maps.
There are also two other NASA data sites worth checking out: the Planetary Data System and the Earth Observing System Data and Information System (EOSDIS). These are great datasets for any project with an environmental focus.
11. CERN
The European Organization for Nuclear Research, or more commonly known as CERN, has published more than three petabytes of data from research findings on particle physics. Their Open Data portal contains data from their Large Hadron Collider (LHC), the world’s largest and most powerful particle accelerator. For instance, you can use data from ATLAS, an experiment in experimental particle physics.
12. International Energy Agency
What if you have a project involving energy production and consumption? The International Energy Agency (IEA) hosts the Atlas of Energy site to share time series data from 1973 to present day statistics on energy.
This is part of a wider IEA ecosystem of analytics tools, including country-level data, databases, and a unique flow energy balance in a Sankey diagram. There are many datasets available: you obtain data on global CO2 emissions levels per capita, renewables production, and electricity generation.
Open data sources: International organizations
13. European Commission
European public sector datasets have been collected and published via the European Commission data.europa site. They span more than 1.5 million datasets across 36 countries, making this one of the largest data repositories online.
You can verify the quality of a certain dataset by checking its metadata quality, which helpfully grades data based on indicators such as interoperability, reusability, and contextuality.
14. World Bank
The World Bank publishes open datasets on global development data–this means that you can browse their datasets by any country or indicator (for example GDP or population). Their site goes beyond providing a catalog of datasets. Check out their DataBank tool, which is a web application that lets you perform quick analysis and easy visualization using their time-series data, right on their site, and export or share your created charts or tables.
Some of their interesting datasets include debt flows for 120 developing countries, the World Bank’s own financial reporting, and the Living Standards Measurement Study that collects microdata on households to better quantify household behavior.
15. The WHO
The World Health Organization invites you to use its Global Health Observatory, which features a comprehensive range of health data for many countries. Their data is grouped into a long list of themes, including: assistive technology, immunization, neglected tropical diseases, and tobacco control.
If you’re looking for inspiration, check out one of their featured dashboards at the bottom of the page. The Triple Billion dashboard tracks the improvement of health of billions of people by 2025 based on a few key indicators.
4. Summary
We’ve taken you through a tour of some of the best open data sources available for use, for free, right now. Let’s quickly review what you need to know when you’re embarking on your next project in search of a dataset to use:
- Consider using an open data source: It’s important to use data that you have the right to use, and publish, and open data generally includes a license you can use and cite in your own work. You also increasingly have access to a growing source of free and open datasets online, which can only enrich your analysis.
- Obtain high-quality data from credible organizations: In our article, we’ve provided a helpful guide to fifteen of the best sources you can use in journalism, research, the public sector, science and technology, and international trends.
Has this piqued your interest in learning more about analytics roles and the field of data analytics in general? Why not try out this free, self-paced data analytics course? You may also be interested in the following articles: