Everyone remotely connected to the data world is abuzz about machine learning. Instead of just knowing what machine learning is, why not go one better and start working with it?
If you’re looking to dive into the world of ML but don’t know where to start, I’ve curated 12 hands-on machine learning projects to help kickstart your journey. These projects range from more simple classification tasks to more complex challenges in natural language processing and image recognition.
Whether you’re a new entrant to data analytics looking for a portfolio project idea, or an intermediate coder interested in machine learning implementations, there’s something here for you.
While you’re at it, check out our other explainer articles on machine learning models, career paths, and essential machine learning tools.
We’ll cover:
12 great machine learning projects
Classification projects
Classification involves sorting data into predefined classes or labels based on their features. Classification projects are a great way for new entrants to machine learning as they help you understand foundational concepts through real-world problems. We’ll take a look at three such machine learning projects.
1. Predicting wine quality
Oenology–or the science of wines, offers a fun introduction to the art of classification. UCI’s wine quality dataset can be used to distinguish between red and white wines.
You can use libraries such as scikit-learn or Tensorflow to predict the quality of wines based on their 11 physicochemical properties such as acidity, presence of sulfates, and sugar. You’ll learn how to understand feature importance and handle imbalanced datasets through a real-world dataset. While the main task is classification, you can also extend the project to run regression analysis to predict wine quality scores.
2. Tree species identification
UCI’s leaf dataset is a great way to understand image preprocessing and feature extraction. This project looks at how to best classify tree species based on the shape and texture of their leaves.
For extra credit, you could augment the original dataset with leaves from your own neighborhood, or extract your own set of features. The learning curve is slightly steeper than most beginner classification projects, but you’ll learn how to create more robust machine learning models.
3. Exoplanet discovery
For the more extraterrestrially-inclined, Caltech’s Kepler Exoplanet dataset offers a unique classification challenge. Learn how to detect exoplanets (planets outside our solar system) by analyzing light curves observed through NASA’s space telescopes.
You’ll be challenged to find ways to improve the accuracy of your model through applying advanced time-series analysis on sequential data, and learn how to handle rare events through anomaly detection. While the project requires more intensive domain knowledge research, it’s also an opportunity to demonstrate your ability to dive deep into a new subject area and come up with a useful model.
Natural language processing projects
Natural language processing stands at the intersection of linguistics and machine learning. It’s a great subject area for coders interested in teaching machines how to understand, interpret, and generate human language in new and useful ways. I’ve listed three compelling NLP projects that are applicable across industries.
1. Spam detection
Emails would be unusable without spam detection algorithms working in the background to filter out unsolicited messages. UCI’s SMS spam collection dataset contains 5,574 messages labeled as either spam or ham (not spam).
You can use Python’s Natural Language Toolkit (NLTK) or scikit-learn libraries to learn how to preprocess the text, extract features, and run your prediction model. This is an excellent project to learn how to work with imbalanced datasets–which are common in real-world business scenarios–since spam messages are only a tiny percent of genuine ones.
2. Sentiment analysis
Every day, millions of people express their opinions online, whether through product reviews or social media posts.
The sentiment140 dataset, released by a group of Stanford researchers contains 1.6 million tweets for you to train your machine learning model on. It labels tweets as positive, negative, or neutral, which is a relatively easy introduction to the art of sentiment analysis. Learn how to interpret human emotions in text, handle slang, and make decisions on ambiguous or nuanced statements.
You’ll learn how to use techniques such as word embeddings and recurrent neural networks, which feature in many NLP applications. You can brush up on your skills in our full guide to sentiment analysis.
3. Fake news detector
Google “fake news detection” and you may not be surprised to learn how many active research projects exist to solve this problem. Identifying bias and misinformation is especially difficult given the speed of content creation.
There are a number of fake news datasets where you can apply popular machine learning libraries like TensorFlow to handle large text data, understand what embeddings are, and grapple with the challenge of creating a model that can reliably understand nuance at scale.
Recommendation projects
Recommendation systems, or RecSys in industry speak, power many of the most popular services we use on a day-to-day basis. These systems try to optimize and personalize our experiences based on our preferences and interactions. These machine learning projects are challenging as it can be hard to predict what users like with noisy datasets.
1. Movies
The MovieLens dataset spans 25 million ratings across 62,000 movies. Learn how to use Python libraries such as Surprise or LightFM to build recommender systems with recommendation algorithms.
You’ll come away with a strong foundation in collaborative filtering, a technique to make recommendations based on user-user or item-item similarities. It’s also a great way to understand concepts like matrix factorization and user embeddings.
2. Building Goodreads
Avid readers will particularly enjoy diving into this project: UCSD’s Goodreads’s dataset is a rich repository of user-book interactions, with more than 2 million books rated by nearly 1 million users.
Building a reading recommendation system will introduce you to concepts like content-based filtering, where recommendations are based on items’ features and users’ preferences.
3. Restaurant ratings
Yelp’s dataset of almost 7 million restaurant reviews can be used to build a model that recommends restaurants based on reviews and restaurant features.
Learn how to handle text data, extra features, and understand how to solve thorny “cold start” problems in ML, where your model needs to make accurate recommendations for new users.
Computer vision projects
Teaching machines to “see” offers a glimpse into a future where machines can interpret and analyze visual data. If you’re interested in more advanced projects involving the analysis of pixels and patterns, here are three projects with real-world applications.
1. Facial expressions
Google’s facial expression dataset contains 156,000 images annotated by human raters to classify them according to various emotions.
You’ll learn how to build a model that can recognise and correctly classify facial expressions using Python libraries such as OpenCV for face detection, and Keras for deep learning. You can take this project in many different directions; one good use case would be to learn how to use convolutional neural networks (CNNs) to enable real-time video processing of human emotions.
2. Traffic analysis
The promise of self-driving cars has dominated the ML news cycle for a while now. Have you ever wondered about the algorithms powering this?
If so, check out the University of Albany’s US-DETRAC dataset with 10 hours of traffic video from China. Use Python libraries like Tensorflow’s Object Detection API or YOLO’s state-of-the-art object detection to build an ML system that can detect and analyze vehicles in real-time. While this project will immerse you in challenging concepts like objection detection techniques and tracking algorithms, you’ll come away with an impressive portfolio project at the intersection of computer vision and urban planning.
3. Human activity
The University of Central Florida has released an action recognition dataset with more than 13,000 videos across 101 action categories, including knitting, diving, basketball and surfing.
You’re tasked with building a model that can correctly label these activities, and more, through temporal sequence analysis. You’ll learn how to process video data and 3D convolutional neural networks. This is a great project for more advanced learners looking to understand the specific complexities of identifying human motion through machine learning.
Wrap-up
From classifying datasets and deciphering nuanced textual sentiments in NLP, to creating accurate recommendation systems and grappling with the challenges of computer vision, these 12 machine learning projects offer many ways for you to learn new techniques and come away with a strong portfolio project to showcase your skills.
If the world of data analytics interests you but you don’t know where to start, why not explore our Machine Learning with Python Course? It covers the basics of machine learning as a field through hands-on projects with an expert mentor, and will give you a good idea of whether or not it’s a career path you’re interested in pursuing further.
You may also be interested in the following articles: