Finding projects for your data analytics portfolio can be tricky, especially when you’re new to the field. You might also think that your data projects need to be especially complex or showy, but that’s not the case. The most important thing is to demonstrate your skills, ideally using a dataset that interests you. And the good news? Data is everywhere—you just need to know where to find it and what to do with it.
In this post, we’ll highlight the key elements that your data analytics portfolio should demonstrate. We’ll then share nine project ideas that will help you build your portfolio from scratch, focusing on three key areas: Data scraping, exploratory analysis, and data visualization.
- What should you include in your data analytics portfolio?
- Data scraping project ideas
- Exploratory data analysis project ideas
- Data visualization project ideas
- What’s next?
Ready to get inspired? Let’s go!
1. What should you include in your data analytics portfolio?
Data analytics is all about finding insights that inform decision-making. But that’s just the end goal. As any experienced data analyst will tell you, the insights we see as consumers are the result of a great deal of work. In fact, about 80% of all data analytics tasks involve preparing data for analysis. This makes sense when you think about it—after all, our insights are only as good as the quality of our data.
Yes, your portfolio needs to show that you can carry out different types of data analysis. But it also needs to show that you can collect data, clean it, and report your findings in a clear, visual manner. As your skills improve, your portfolio will grow in complexity. As a beginner though, you’ll need to show that you can:
- Scrape the web for data
- Carry out exploratory analyses
- Clean untidy datasets
- Communicate your results using visualizations
If you’re inexperienced, it can help to present each item as a mini-project of its own. This makes life easier since you can learn the individual skills in a controlled way. With that in mind, we’ll keep it nice and simple with some basic ideas, and a few tools you might want to explore to help you along the way.
2. Data scraping project ideas for your portfolio
What is data scraping?
Data scraping is the first step in any data analytics project. It involves pulling data (usually from the web) and compiling it into a usable format. While there’s no shortage of great data repositories available online, scraping and cleaning data yourself is a great way to show off your skills.
The process of web scraping can be automated using tools like Parsehub, ScraperAPI, or Octoparse (for non-coders) or by using libraries like Beautiful Soup or Scrapy (for developers). Whichever tool you use, the important thing is to show that you understand how it works and can apply it effectively.
Before scraping a website, be sure that you have permission to do so. If you’re not certain, you can always search for a dataset on a repository site like Kaggle. If it exists there, it’s a good bet you can go straight to the source and scrape it yourself. Bear in mind though—data scraping can be challenging if you’re mining complex, dynamic websites. We recommend starting with something easy—a mostly-static site. Here are some ideas to get you started.
Data scraping project ideas
The Internet Movie Database
A good beginner’s project is to extract data from IMDb. You can collect details about popular TV shows, movie reviews and trivia, the heights and weights of various actors, and so on. Data on IMDb is stored in a consistent format across all its pages, making the task a lot easier. There’s also a lot of potential here for further analysis.
Many beginners like scraping data from job portals since they often contain standard data types. You can also find lots of online tutorials explaining how to proceed. To keep it interesting, why not focus on your local area? Collect job titles, companies, salaries, locations, required skills, and so on. This offers great potential for later visualization, such as graphing skillsets against salaries.
Another popular one is to scrape product and pricing data from e-commerce sites. For instance, extract product information about Bluetooth speakers on Amazon, or collect reviews and prices on various tablets and laptops. Once again, this is relatively straightforward to do, and it is scalable. This means you can start with a product that has a small number of reviews, and then upscale once you’re comfortable using the algorithms.
For something a bit less conventional, another option is to scrape a site like Reddit. You could search for particular keywords, upvotes, user data, and more. Reddit is a very static website, making the task nice and straightforward. Later, you can carry out interesting exploratory analyses, for instance, to see if there are any correlations between popular posts and particular keywords. Which brings us to our next section.
3. Exploratory data analysis project ideas
What is exploratory data analysis?
The next step in any data analyst’s skillset is the ability to carry out an exploratory data analysis (EDA). An EDA looks at the structure of data, allowing you to determine their patterns and characteristics. They also help you to clean your data. You can extract important variables, detect outliers and anomalies, and generally test your underlying assumptions.
While this process is one of the most time-consuming tasks for a data analyst, it can also be one of the most rewarding. Later modeling focuses on generating answers to specific questions. An EDA, meanwhile, helps you do one of the most exciting bits—generating those questions in the first place.
Languages like R and Python are often used to carry out these tasks. They have many pre-existing algorithms that you can use to carry out the work for you. The real skill lies in presenting your project and its results. How you decide to do this is up to you, but one popular method is to use an interactive documentation tool like Jupyter Notebook. This lets you capture elements of code, along with explanatory text and visualizations, all in one place. Here are some ideas for your portfolio.
Exploratory data analysis project ideas
Global suicide rates
This global suicide rates dataset covers suicide rates in various countries, with additional data including year, gender, age, population, GDP, and more. When carrying out your EDA, ask yourself: What patterns can you see? Are suicides rates climbing or falling in various countries? What variables (such as gender or age) can you find that might correlate to suicide rates?
World Happiness Report
On the other end of the scale, the World Happiness Report tracks six factors to measure happiness across the world’s citizens: life expectancy, economics, social support, absence of corruption, freedom, and generosity. So, which country is the happiest? Which continent? Which factor appears to have the greatest (or smallest) impact on a nation’s happiness? Overall, is happiness increasing or decreasing?
Aside from the two ideas above, you could also use your own datasets. After all, if you’ve already scraped your own data, why not use them? For instance, if you scraped a job portal, which locations or regions offer the best-paid jobs? Which offer the least well-paid ones? Why might that be? Equally, with e-commerce data, you could look at which prices and products offer the best value for money.
Ultimately, whichever dataset you’re using, it should grab your attention. If the data are too complex or don’t interest you, you’re likely to run out of steam before you get very far. Keep in mind what further probing you can do to spot interesting trends or patterns, and to extract the insights you need.
4. Data visualization project ideas
What is data visualization?
Scraping, tidying, and analyzing data is one thing. Communicating your findings is another. Our brains don’t like looking at numbers and figures, but they love visuals. This is where the ability to create effective data visualizations comes in. Good visualizations—whether static or interactive—make a great addition to any data analytics portfolio. Showing that you can create visualizations that are both effective and visually appealing will go a long way towards impressing a potential employer.
Some free visualization tools include Google Charts, Canva Graph Maker (free), and Tableau Public. Meanwhile, if you want to show off your coding abilities, use a Python library such as Seaborn, or flex your R skills with Shiny. Needless to say, there are many tools available to help you. The one you choose depends on what you’re looking to achieve. Here’s a bit of inspiration…
Data visualization project ideas
Topical subject matter looks great on any portfolio, and the pandemic is nothing if not topical! What’s more, sites like Kaggle already have thousands of Covid-19 data sets available. How can you represent the data? Could you use a global heatmap to show where cases have spiked, versus where there are very few? Perhaps you could create two overlapping bar charts to show known infections versus predicted infections. Here’s a handy tutorial to help you visualize Covid-19 data using R, Shiny, and Plotly.
Most followed on Instagram
Whether you’re interested in social media, or celebrity and brand culture, this dataset of the most-followed people on Instagram has great potential for visualization. You could create an interactive bar chart that tracks changes in the most followed accounts over time. Or you could explore whether brand or celebrity accounts are more effective at influencer marketing. Otherwise, why not find another social media dataset to create a visualization? For instance, this map of the USA by data scientist Greg Rafferty nicely highlights the geographical source of trending topics on Instagram.
Another topic that lends itself well to visualization is transport data. There are great step-by-step tutorials out there for how to visualize travel data, flight data being a prime example. For instance, check out data scientist Spencer J Fox’s flight data visualizations using ggplot2, a data visualization package for R.
5. What’s next?
In this post, we’ve explored which skills every beginner needs to demonstrate in their data analytics portfolio. Regardless of the dataset you’re using, you should be able to demonstrate the following abilities:
- Web scraping—using tools like Parsehub, Beautiful Soup, or Scrapy to extract data from websites (remember: static ones are easier!)
- Exploratory data analysis and data cleaning—manipulating data with tools like R and Python, before drawing some initial insights.
- Data visualization—utilizing tools like Tableau, Shiny, or Plotly to create crisp, compelling dashboards, and visualizations.
Once you’ve mastered the basics, you can start getting more ambitious with your data analytics projects. For example, why not introduce some machine learning projects, like sentiment analysis or predictive analysis? The key thing is to start simple and to remember that a good data analytics portfolio needn’t be flashy, just competent.
To further develop your skills, there are loads of online courses designed to set you on the right track. To start with, why not try our free, five-day data analytics short course?
And, if you’d like to learn more about becoming a data analyst and building your portfolio, check out the following: