Big data is everywhere—but what exactly is it, and how is it different from “ordinary” data?
Every action you take leaves a trail of data—both on and offline. In fact, humans have been leaving their informational footprint since the beginning of time, from the tallies used by ancient civilizations to keep track of inventories, through to the first ever census in the year 3800 BCE. And, with the rapid digitalisation of the last thirty years, it is now easier than ever to effectively capture all kinds of data.
So where does big data come into it? Well, if you consider the online world alone, there are four-and-a-half billion active internet users worldwide—each leaving their own complex data trail. That’s exactly what the term “big data” refers to: the huge volumes of unstructured data that are generated every minute of every day. Unlike “small” data, big data comes with some major challenges—but, when leveraged correctly, it provides untold value.
If the big data buzzword has got your head in a spin, you’ve come to the right place. In this guide, we’ll explain everything you need to know about big data, including where it comes from and why it’s so important.
We’ll answer the following questions:
- What is big data?
- Where does big data come from?
- What are the different types of big data?
- What is the value of big data?
- What are some examples of big data?
At the end of this post, you’ll find a summary of all the key takeaways and links to further reading. So, what is big data? Let’s take a look.
1. What is big data?
John Mashey, a computer scientist, is thought to have coined the term “big data” in the early 90s, but the concept really started to gain traction in the early 2000s. There are several reasons for this, such as the advent of the Internet of Things, our penchant for smart, connected devices, and the ubiquity of social media. As it stands, we’re generating huge volumes of data all the time—and fast! This is essentially what big data is: large volumes of data that are exceptionally fast and, usually, too complex to be processed using traditional methods.
But just how big is “big data”? And what exactly do we mean when we say “fast”? In 2001, Douglas Laney, a data expert at leading research firm Gartner, came up with the three Vs of big data: Volume, Velocity, and Variety. These characteristics are used to define big data and distinguish it from “ordinary” data.
The three Vs of big data
Volume: The first thing you should know about big data is that it is huge—and growing constantly. At the beginning of 2020, the digital universe was estimated to consist of a whopping 44 zettabytes of data. To put that into perspective, one zettabyte is roughly equal to a trillion gigabytes. And, by 2025, it’s thought that approximately 463 exabytes of data will be created every 24 hours worldwide. An exabyte is equivalent to one billion gigabytes. So, when we talk about big data, we’re dealing with massive, almost incomprehensible volumes of the stuff.
Velocity: Our ever-connected world means that companies are literally inundated with data. Every single person who uses a connected device, surfs the internet, or uses social media is generating their own stream of data—and, from a business perspective, it’s in companies’ best interest to capture and utilize this data. So, the velocity of big data simply refers to the sheer speed at which data is generated and gathered.
Variety: Not only is big data huge in volume and mind-bogglingly fast, it also comes from myriad different sources. This is, in part, what makes big data so complex; it comes in many different forms—from video, text, and image data to audio data, real-time data, and beyond—and therefore requires different types of processing and analysis. We’ll explore where big data comes from and the different types of big data in sections two and three.
So what’s the difference between big data and “small” or “normal” data?
The best way to distinguish between data and big data is to consider the challenges that big data presents. “Ordinary” data is essentially structured data which fits neatly in a database, and can be gathered and analyzed using traditional tools and software (such as Excel). By contrast, big data is so huge in volume, so varied and unstructured in format, and so fast in its accumulation that traditional tools are simply not sufficient when it comes to processing and understanding the data. In that respect, the term “big data” refers not only to the three Vs; it also encompasses the complex tools and techniques that are needed to draw meaning from the data.
2. Where does big data come from?
Now we have a working definition of big data, let’s explore where this voluminous and complex data comes from. There are three main sources of big data:
- Social data
- Machine data
- Transactional data
Let’s take a closer look at each of these.
Social datais any and all data that comes from social media platforms like Facebook, Twitter, and Instagram. It includes likes, Tweets, comments, images, links, location check-ins, pins on Pinterest—essentially anything that might be shared publicly on social media. Companies use social data to create targeted advertising campaigns (like those scarily accurate ads you get on Instagram).
Machine datais generated by computers, applications, devices, and gadgets—any kind of machinery that can be programmed. Machine data is generated automatically without the active involvement of a human; for example, through sensors in medical devices, speed cameras installed on the road, smart cars, financial transactions, and satellites. At the same time, if you analyze a set of data in order to make predictions or secondary calculations, that would also be considered machine-generated data.
Transactional datacan best be described as information which documents a transaction between two parties—whether it’s an organization or an individual. In this case, a transaction doesn’t necessarily have to be financial; it’s any kind of exchange, agreement, or transfer that takes place. Some common examples of transactional data include a receipt or invoice for a purchase, a record stating that a customer has returned an item, withdrawal of money from a bank account, a reservation for a hotel room, or a customer subscribing to an email list. It’s important to note that transactional data always contains a time-based element (e.g. a date), so it becomes less relevant over time.
3. What are the different types of big data?
We mentioned earlier that big data is extremely varied, coming from a range of different sources and taking different formats (or structures). The structure of the data is important as it determines how the data will be gathered, processed, analyzed, and stored—essential for turning the raw data into something useful.
Big data can be classified into one of three categories:
- Structured data
- Unstructured data
- Semi-structured data
Let’s explore each of these big data types in more detail.
In simple terms, structured data is the kind of data that is already stored in databases. It can be processed, stored, and retrieved in a fixed format, and it’s the easiest type of big data to work with as it doesn’t require much preparation before it can be analyzed. There are two main sources of structured data: it can be generated automatically by machines, or entered by a human (for example, a user entering their name, age, and location when signing up as a new customer). You can think of structured data in terms of the neat rows and columns you might see in an Excel spreadsheet. Structured data accounts for just a small proportion of available big data (around 20%).
As the name suggests, unstructured data is the opposite of structured data; completely unorganized, with no clear format. You can think of unstructured data as data that doesn’t mean anything if it’s not put into context. For example, in data terms, a tweet posted on Twitter is just a string of words—there is no meaning or sentiment to it (before analysis, that is). The same goes for an image you share or a telephone call you make; these are all examples of unstructured data that need to be placed into some kind of external, real-world context in order to make them meaningful. Working with unstructured data is much more labor-intensive, involving complex algorithms such as those used in machine learning, AI, and natural language processing. Around 80% of the world’s big data comprises unstructured data.
Semi-structured data is essentially unstructured data which has some organizational properties, making it easier to process than purely unstructured data. Semi-structured data often has metadata attached to it (data that describes or gives information about another piece of data). For example, if you took a selfie on your smartphone, it might attach a timestamp to the photo and log the device ID. The image itself is unstructured data, but these additional details provide some context. Similarly, if you send an email to a friend, the content itself would be considered unstructured data, but there would be some “clues” attached, like the IP address and the email address the email came from.
4. What is the value of big data?
So we’ve got volume, velocity, and variety—but how does that translate into real-world value? What does big data actually mean for businesses and organizations?
Big data alone is not valuable, but it does hold huge potential. When we talk about the value of big data, we’re really talking about the value of big data analytics. Big data analytics is a set of technologies and techniques that essentially turn your big data into something meaningful; using predictive models and statistical algorithms, big data analytics uncovers patterns, trends, and, ultimately, insights that give you an idea of what’s working well, what’s not working so well, and what might work well in the future.
Big data, and big data analytics, can help businesses to:
- Achieve more precise audience segmentation, allowing you to offer more customized products and services. This translates to better customer service, increased customer satisfaction, and more effective marketing.
- Automate certain processes, improving operational efficiency and therefore saving time, money, and resources.
- Uncover new potential revenue streams, for example by identifying a need for a particular product or service.
- Cultivate transparency and break down silos between different departments. For example, using data to understand your target audience would help to align marketing, sales, and customer service.
- Accurately forecast and allocate the need for certain resources. For example, big data analytics might show that student enrollments peak in August but are almost zero in March, allowing an education provider to plan accordingly.
Big data is essentially the driving force behind smart business decisions—as long as it’s leveraged effectively through big data analytics. It’s important to bear in mind that the true value of big data depends on how it is analyzed, where the insights are applied, and on what scale.
5. What are some examples of big data?
The best way to understand what big data is and how it’s used is to look at some real-world examples. Below, we’ll briefly consider some of the main industries which are using big data and how they are doing so.
Big data in healthcare
Huge volumes of big data are collected by medical devices, electronic health records, medical imaging, and clinical research—to name just a few. As a result, big data and big data analytics are transforming the way healthcare providers treat patients. Some crucial applications include analyzing genetic, lifestyle, and social factors in order to predict and prevent the risk of certain diseases; using data to issue highly personalized medication and treatment plans (precision medicine); detecting medication errors and identifying potentially dangerous adverse reactions; and monitoring disease trends in order to come up with effective health strategies. More recently, governments and healthcare providers have been exploring the idea of a track-and-trace system in order to limit the spread of COVID-19. You can read about the UK’s proposed contact tracing app here.
Big data in marketing and advertising
The more a company knows about their customers, the better-equipped they are to tailor their products and services accordingly. It’s no wonder, then, that big data analytics has a major role to play in the marketing and advertising sectors. Marketers and advertisers use big data to paint a detailed picture of their users and how they behave, sometimes in real-time; this enables them to quickly identify if a customer is about to “drop off” and to provide them with an offer or an incentive at just the right moment. At the same time, data also allows for more precise audience segmentation; instead of offering a one-size-fits-all product or service, companies can tailor almost everything to the individual—again, the most common example that springs to mind is those all-too-accurate ads that pop up when you’re scrolling through Instagram. Marketers can even use data analytics to manage the brand reputation and uncover what people are saying about their products and services, for example through sentiment analysis. You can learn more about sentiment analysis (and other data analysis techniques) in this guide.
Big data in travel, transport, and logistics
Think of how often you use Google Maps (or a similar app) to navigate your way from A to B, or how you impatiently track that package you’ve just ordered from Amazon. These are just some of the small pieces in the big data jigsaw of the travel industry. On a larger scale, big data is used to monitor and predict traffic on busy routes, to track goods in transit for highly accurate delivery times, to monitor weather conditions in real-time in order to mitigate risk (for aeroplane travel, for example), and to make calculations to improve fuel efficiency. Big data also has a pivotal role to play in the development of self-driving cars.
Big data in education
Traditionally, education has taken a standardized, one-size-fits-all approach. However, with the rise of big data and analytics, educators are increasingly able to tailor their educational models and the overall learning experience to suit the individual needs of the student. For example, online learning providers use big data analytics to track student progress and to identify common obstacles and when they tend to occur. In doing so, they can provide extra support when it’s needed and increase the student’s chance of success. Traditional on-campus colleges are also using big data to reduce dropout rates; for example, Georgia State University used their masses of student data to identify eight hundred different behaviors that correlated with dropping out. Based on these insights, they were able to redesign various aspects of the student journey, improving their overall graduation rate by twenty-two percentage points.
We’ve touched on just a few examples of big data. For a more in-depth look at how big data is used across different industries, check out this post detailing examples of big data in the real world.
In this post, we’ve defined what big data is and considered how it can be leveraged to optimize different areas of business. Below is a summary of the key takeaways:
- Big data can be defined by the “three Vs”: Volume, velocity, and variety.
- The main difference between big data and “small” data is that analyzing big data requires more complex tools and techniques.
- There are three main sources of big data: Social data, machine data, and transactional data.
- There are three main types of big data: Structured, unstructured, and semi-structured.
- Big data analytics turns big data into something meaningful, uncovering patterns and insights which are used to make smarter business decisions.
Remember: Big data alone is not valuable—the true value lies in how big data is analyzed and, subsequently, how these insights are applied. Keen to learn more about data analytics? Get a hands-on introduction in this free, five-day data analytics short course. You might also enjoy the following: