What’s the difference between structured and unstructured data? What are some examples of the two? Find out in this guide.
For any data analyst, the ability to distinguish between different data types is vital. When it comes to big data, all types fit into one of two categories: structured and unstructured data.
In an ideal world, data is ordered into clear tables, columns, and rows. It’s tidily stored in a relational database or warehouse. This allows you to mine it for useful information. In reality, however, data rarely starts out this way. Unstructured, raw data is a stream of disordered information stored in vast, disorganized repositories known as data lakes. And if you attempt to apply traditional analytics techniques to these kinds of data, you’ll soon run into trouble. Before obtaining insights, you first need to bring your data into a state of order and cohesion. Only once this is done can you carry out targeted activities like data mining (identifying patterns and behaviors in a dataset) and predictive analysis (testing hypotheses).
In this post, we provide a broad overview of the differences between structured and unstructured data. We also offer some examples of the two. We’ll cover:
- The basics: quantitative vs. qualitative data
- What is structured data?
- What is unstructured data?
- What is semi-structured data?
- Examples of structured and unstructured data
- Structured vs. unstructured data: a summary of differences
- Wrap-up and further reading
Ready to discover the difference between structured and unstructured data? Let’s dive in.
1. What is the difference between quantitative and qualitative data?
Before we fully understand the differences between structured and unstructured data, it helps to learn about qualitative and quantitative data. While we’ve written a whole post on this, here’s the difference in a nutshell:
- Quantitative data refers to quantities. It can be measured or given numerical values. Examples of quantitative data include things like dates, times, weights, heights (and so on).
- Qualitative data, meanwhile, is primarily descriptive. It refers to things that can be observed but not definitively measured. Examples include blood types, brands of car, product reviews, names, and eye colors. While qualitative data may contain numerical values, these are usually descriptors or classifiers, rather than adhering to any known measurement scale (for more on this, learn about ordinal data).
While you don’t need to get too hung up on these definitions, understanding the difference between qualitative and quantitative data is useful when we distinguish between structured and unstructured data. This will become clear as we proceed.
2. What is structured data?
Structured data is that which fits into a predefined data model. It has defined data types and rules for processing and accessing those data. Any clearly labeled database of information (such as an Excel file, SQL database, or data warehouse) can be considered structured data. This type of data is stored in tables, with clear relationships between the different rows and columns. This makes it relatively straightforward to analyze or mine for information. A common tool used to do this is SQL (Structured Query Language).
Being documented and labeled, structured data is easy to parse (or break down into its constituent parts) and extract information from. For instance, let’s say you’re working on a customer email campaign. You may have a lot of information stored about each customer in your customer relationship management (CRM) system, everything from phone numbers to invoices, and a history of interactions. However, because the data is properly ordered, it takes little effort to mine the exact data you need to complete the task at hand (for example, email addresses and first names).
Structured data is quantitative data. This means it contains measurable numeric values such as numbers, dates, and times. It can also include characters and non-numeric data. However, these are always stored as encoded strings. This simply means that while the data may be text-based, this text is represented using numerical units that a computer can understand.
Since structured data has usually been ordered, stored, and cleaned, it tends to consume much less space than raw data. Stored in robust systems, it’s generally easier to keep secure, too. Unfortunately, according to some estimates, only about 20% of enterprise data is in a structured format. While structured data requires regular housekeeping, it’s the other 80% of unstructured data that represents the real task for data analysts!
3. What is unstructured data?
Unstructured data (often referred to as ‘big data’ or ‘raw data’) is data that lacks any predefined format or model. It’s usually vast in quantity, text-heavy, and stored in its native format in what’s known as data lakes. Unstructured data requires a lot of storage space and is hard to keep secure. Perhaps most importantly, because it’s not stored in relational databases, it’s much harder for computers (and humans) to interpret.
Unstructured data comes in various formats. These include images, audio, video, spreadsheets, and word-processed documents, to name a few. A real-world example of structured versus unstructured data is the date and time of an email (structured data) versus the content of the email itself (unstructured data). The former is easy to parse, store in databases and extract meaning from. The latter can also be parsed, but making sense of it requires more sophisticated techniques than simply storing it in an ordered way.
Due to a lack of clear parameters and encoding, it’s impossible to analyze unstructured data without organizing it first. Normally, several rounds of data wrangling and parsing (commonly using machine learning algorithms) are required. Depending on the data, it may also require various other skills. For instance, you’ll need to know about data cleaning (to remove errors and ‘noise’), be familiar with natural language processing (to extract semantics from written text), and computer vision (to make sense of images or videos). Even once these tasks are complete, much more wrangling is usually needed to turn the data into something that offers meaningful insights.
Because unstructured data is usually text-heavy and non-numeric, it is considered quantitative. While it may include numbers, these are generally used to categorize information. About 80% of all enterprise data is in an unstructured format. This is where the real work begins!
4. What is semi-structured data?
Just when you thought we were going to cut you some slack with a simple, binary data analytics definition, here we are to throw a spanner in the works!
Something to be aware of, alongside structured and unstructured data, is the increasing existence of what’s known as semi-structured data. Like unstructured data, semi-structured data does not conform to the tabular formats of relational databases. However, it does contain markers that differentiate the various components within the data. For this reason, it has an inherent hierarchy, hence being called semi-structured.
In the digital age, semi-structured data is increasingly common. It’s often produced by online apps, object-oriented databases, email clients, and file formats like JSON (designed to be human-readable.)
The main takeaway right now is that semi-structured data is easier to work with than unstructured data. It saves some of the work in parsing and organizing completely disordered big data. The tagged elements vastly simplify the task of creating new data models, while minimizing the risk of errors creeping in when you translate the data from one format to another. For now, that’s all you need to know!
5. Examples of structured and unstructured data
Now we understand the differences between structured and unstructured data, what are some examples of the two? Let’s take a look.
Examples of structured data
Structured data includes things like:
- Dates and times
- Cell phone numbers
- Social security numbers
- Banking/transaction information
- Customer names, postal addresses, and email addresses
- Product prices
- Serial numbers
Sources of structured data include:
- Hotel reservation systems
- Point-of-sale software
- Customer relationship management (CRM) systems
- Enterprise resource planning (ERP) systems
- Financial data warehouses
- Online forms
- Medical devices
Examples of unstructured data
Unstructured data includes things like:
- Text-only files
- Email messages
- Audio and video files
- Images and digital photographs
- Surveillance camera recordings
- Books and PDFs
- Product reviews
Sources of unstructured data include:
- Email clients
- Social media
- Instant messaging systems (like WhatsApp or SMS)
- Websites and file logs
- Word processing or presentation software
- Tools for viewing media
- Location or geo-data (GPS, weather satellites, etc.)
Hopefully, this helps you visualize the data types more easily. Next up, let’s summarize the differences between them.
6. Structured vs. unstructured data: A summary of differences
By now, you should be able to distinguish between structured and unstructured data with relative ease. If you’re still in doubt, here are a couple of checklists to help determine between the two.
Structured data is:
- Quantitative (numerical)
- Thoroughly ordered and fact-based
- Stored in data warehouses or relational databases
- Organized using predetermined formats / data models
- Encoded and usually text-based
- Easy to search using tools like SQL
Unstructured data is:
- Qualitative (non-numerical)
- Highly disorganized and subjective
- Stored in its native format in applications, data lakes, or non-relational databases
- Not organized using predefined formats or data models
- In the form of disordered texts, images, videos, sounds
- Extremely difficult to search
And there you have it. Structured vs. unstructured data in a nutshell. Easy as pie!
7. Wrap-up and further reading
In this post, we’ve introduced you to the concepts of structured and unstructured data. You should now have a solid grasp of the differences between the two, as well as being able to cite some clear examples.
Are you learning about data analytics in order to pursue a career in the field? If so, get hands-on with a free introductory data analytics short course. For more beginner’s guides, check out the following: