New to data analytics? You’ll no doubt have discovered that the terminology can be a bit of a minefield. While any new topic of study comes with new terms to learn, it’s particularly tricky in a complex field like data analytics. From big data to machine learning, algorithms and artificial intelligence, data analytics is awash with buzzwords. But what do they all mean?
In this post, we’ll look at 17 data analytics buzzwords you’re likely to come across and what they really mean. Let’s dive in.
1. Big data
Big data. You’ll find it everywhere. But what is it? The term is used to describe repositories of data so huge that they defy traditional processing techniques. More broadly, the term ‘big data’ is also used to describe the methods relating to the management of these types of datasets. The sources of big data include everything from credit cards to surveillance, social media, and electronic communications. Big data is often categorized into three types: social data (from social media platforms), machine data (generated by computers and devices), and transactional data (where data is exchanged between two parties). Learn about big data in detail in this post.
2. Structured and unstructured data
All data is either ‘structured’ or ‘unstructured.’ Most data starts life in an unstructured format: disorganized, text-heavy, and without any underlying configuration. Big data, in particular, is usually unstructured (at least to begin with). This makes it hard to navigate or use. To get a dataset into a useful format that can be analyzed, we must structure it. Structured data has been organized into databases, spreadsheets, or content management systems. It is often ordered into rows and columns, too, making it much easier to navigate. Structuring data is a large part of a data analyst’s job. Fun fact: around 80% of any organization’s data is unstructured—think of all that untapped potential!
In data analytics, you’ll continuously hear about algorithms. From machine learning to decision tree and random forest, a whole bunch of analytical techniques rely on them. But what is an algorithm? Simply put, it is a carefully defined, step-by-step process that is used to solve logical or mathematical problems. In terms of data analytics, these algorithms are carried out by computers. In day-to-day life, any task you run on your phone or laptop will be carried out by an algorithm. Within data analytics, algorithms are used to streamline tasks that computers can carry out much faster and more accurately than humans. This makes them perfect for sorting, parsing, or analyzing big datasets.
4. Predictive and prescriptive analytics
Two core tenets of data analytics are predictive and prescriptive analytics. Predictive analytics uses algorithms to predict (or make informed guesses about) what is likely to happen in the future, based on existing data. Meanwhile, prescriptive analytics recommends a course of action based on these predictions. Courses of action may be intended to shape future outcomes or to simply take advantage of existing ones. The power of predictive and prescriptive analytics is commonly used by businesses to drive profit, improve customer experiences, and to stay ahead of the competition. You can learn more about different types of analysis here.
5. Regression and classification
Regression and classification are two types of models commonly used in predictive analytics. Classification is used in machine learning (see number seven!) to predict or identify discrete categories of data. Meanwhile, regression models are used to identify continuous values of data. Without getting into the details, the broad takeaway here is that classification is about predicting labels (e.g. red, blue, green, etc.), while regression is about predicting quantities (e.g. probabilities, temperature, or sums, etc.).
6. Artificial intelligence (AI)
In short, AI is the pursuit of creating machines that can learn. Artificial intelligence uses algorithms designed to mimic the way humans think. While the ultimate aim of AI is to create machines that can think (and even feel) the way humans do, we are not yet close to this goal. So don’t fear—artificial intelligence is not the world-ending technology that many think! Nevertheless, it does have a powerful range of applications in areas as diverse as environmental management and vaccine development. It’s also used in everyday technologies, from smartphones to self-driving cars.
7. Machine learning
Machine learning (ML) algorithms learn without being explicitly programmed to do so. Machine learning first evolved in response to the wider pursuit of artificial intelligence. It has fast become a specialized field in its own right. Since ML algorithms ingest and learn from large amounts of data, they’re popular for predictive analytics and for making sense of big data. Machine learning is also commonly used to carry out tasks that would be impractical or time-consuming for people to carry out (for instance, managing search engine results).
8. Supervised and unsupervised learning
Supervised and unsupervised learning are two types of machine learning. The first, supervised learning, uses labeled training data. This involves an algorithm ingesting many labeled examples of input variables, along with their corresponding outputs. As a result, the algorithm learns to classify new (or unlabeled) data. An unsupervised learning algorithm, however, ingests unlabeled data. It then models the underlying structure and patterns of the data based on its own, i.e. it does not use prescriptive models of how input and output data relate to each other. This is a fairly complex topic, but you can learn more in this guide to machine learning and deep learning.
9. Deep learning
Deep learning is a subset of machine learning. It commonly uses unsupervised learning techniques, in the form of algorithms that mimic the workings of the human brain. As a result, it can solve highly complex tasks in a short amount of time. Deep learning is the closest thing we currently have to human-like artificial intelligence. Deep learning tends to improve with greater amounts of data. As such, whereas less complex approaches rely on data that is mathematically straightforward to process, deep learning can find patterns in data that even humans cannot spot.
10. Neural networks
A neural network is a type of system within deep learning that is used to solve highly complex problems. It can be used for many things, like complex image processing or for solving protein folding problems. Neural networks apply a layered network of algorithms that aim to recognize the underlying connections between different data. In this respect, it is analogous to the neurons in the human brain. However, neural networks do not mimic the human brain exactly. We’re not quite at the point of creating sentient androids just yet!
11. Natural language processing
Natural language processing (NLP) is a subset of AI that concerns itself with how computers analyze linguistics and spoken or written language. NLP aims to create computer programs that understand the complexities and nuances of human speech. In this context, the term ‘natural language’ really means ‘human language,’ and is used to distinguish it from computer programming languages. NLP algorithms can be applied to both text and spoken word. They’re commonly used for things like email filtering, voice recognition in smart assistants, predictive text, and many other data analytics tasks, such as sentiment analysis.
12. Internet of Things
The Internet of Things (IoT) is what we call the growing network of physical objects embedded with sensors, software, and other tech. These objects collect and exchange data via the internet. Fitbits, smartphones, self-driving cars, e-scooters, industrial machinery, smart speakers, doorbells, and even refrigerators (‘You need milk!’) are all objects that belong to the IoT. The IoT has the potential to make our lives much more convenient and efficient. It’s also hoped that IoT can help solve many of the problems that humanity faces, from air pollution to food supply issues. It also comes with potential privacy and cybersecurity problems, though, so there are a few kinks to iron out before we get there.
13. Decision tree and random forest
A decision tree is a type of algorithm used to support decision-making. It uses a tree-shaped model of decisions, incorporating potential outcomes and chance events. In data analytics, decision trees are commonly used to solve classification and regression problems. Random forest, meanwhile, is a more complex and robust variation of the same idea. By using multiple decision trees, it reduces margins of error.
Python is all abuzz right now…It’s a popular programming language used by data analysts. Easy to learn, with many applications, Python has vast libraries suitable for various data analytics tasks. This includes data visualization, regression modeling, data cleaning, and more. According to the TIOBE index (which measures the popularity of programming languages) Python is currently the third most popular language in the world, and its popularity is increasing year by year. One to add to your data analytics armory, then, if you haven’t already! You can learn more about Python and how to learn it here.
15. Data wrangling
Data wrangling is the process of collecting raw data, cleaning it, mapping, and storing it in a useful format. A common buzzword, there is often confusion around the term, since it is also commonly used as a catch-all to describe other stages in the data analytics process. This includes planning what data to collect, the process of creating algorithms to collect these data, carrying our exploratory analysis, quality control, creating data structures, and so on.
16. Data mining
Data mining is the process of discovering previously unknown patterns in large datasets. It involves storing data in a structure (or database) for future use. After data wrangling, data mining is a common next step in the data analytics process. The main difference is that data wrangling is a form of data housekeeping, whereas data mining is all about knowledge discovery. As a buzzword, the term is commonly misused to describe any form of large-scale data manipulation. This confusion is probably due to the fact that it sits at the intersection of many interrelated disciplines, including data analytics, computer science, statistics, and machine learning.
17. Business analytics
Business analytics is data analytics that focuses specifically on business-related systems and procedures. Its goal is to obtain practical, data-driven insights that can be used to streamline or improve these business processes. Business analysts and data analysts are often used as synonyms, with good reason: they are very similar roles. However, data analysts tend to have a ‘raw data’ (or statistical) focus, whereas a business analyst is usually more concerned with an organization’s structure, operational model and wider industry expertise. If you want to learn more about the differences between a data analyst and a business analyst, check out this post.
Wrap up and further reading
So there you have it! 17 of the most common data analytics buzzwords, and what they really mean. No doubt you’ll have more questions, so be sure to check out the rest of our blog for in-depth explanations of these concepts, and more.
If you want to get hands on with data, check out our free-five day data analytics short course, or see the following posts for more data analytics topics: