What is ratio data? What’s it used for? And how can we best collect and analyze it? Find out in this guide.
Looking to break into the field of data analytics? Or simply a data enthusiast? A prerequisite for exploring data more deeply is getting to grips with the different types of data you might encounter. Broadly speaking, there are four main types of data (also known as ‘levels of measurement’). These are nominal, ordinal, interval, and ratio data. In this post, we’re going to explore the last on this list—ratio data.
First up, though, it’s important to understand that the four data types do not stand alone; they are closely related. We’ll start by summarizing the four. We’ll then explore the various aspects of ratio data in closer detail. Want to jump to a particular topic? Use the clickable headings:
- An introduction to the four different types of data
- What is ratio data? A definition
- What are some examples of ratio data?
- How is ratio data collected and what is it used for?
- How to analyze ratio data
- Summary and further reading
Ready to get your head around ratio data? Then let’s dive in!
1. An introduction to the four different types of data
Broadly speaking, whatever data you are using, you can be certain that it falls into one or more of four categories: nominal, ordinal, interval, and ratio. Introduced in 1946 by the psychologist Stanley Smith Stevens, these four categories are also known as the levels of measurement. They are now widely used across the sciences and within data analytics to define the degree of precision to which a variable has been measured. As a hierarchical scale, each level builds on the one that comes before it.
The most basic levels of measurement are nominal and ordinal data. These are types of categorical data that take relatively simplistic measures of a given variable. Building on these are interval and ratio data—more complex measures. These are both types of numerical data. They can be harder to analyze but will, in general, lead to much richer, actionable insights. Let’s briefly look at what each level measures:
Nominal data is the simplest data type. It classifies (or names) data without suggesting any implied relationship between those data. For instance, countries or species of animals are both forms of nominal data.
Ordinal data also classifies data but it introduces the concept of ranking. An example might be labeling animals, but this time by using discrete and imprecise measures of their speed (‘slow’, ‘medium’, ‘fast’).
Interval data both classifies and ranks data (like ordinal data) but introduces continuous measurements. Examples might be the time of day or temperature measured on either the Celsius and Fahrenheit scale. Importantly, it always lacks a ‘true zero.’ A measurement of zero can be midway through a scale (i.e. you can have minus temperatures).
Ratio data classifies and ranks data, and uses measured, continuous intervals, just like interval data. However, unlike interval data, ratio data has a true zero. This basically means that zero is an absolute, below which there are no meaningful values. Speed, age, or weight are all excellent examples since none can have a negative value (you cannot be -10 years old or weigh -160 pounds!)
What do the different levels of measurement tell you?
Because each type of data has different features, this impacts how we analyze them. For instance, we can’t use a regression model on nominal data, because nominal data lacks the necessary characteristics required to carry out this type of analysis (namely: no dependent and independent variables).
All statistical techniques fall into two broad categories: descriptive statistics (which summarize a dataset’s features) and inferential statistics (which help us make predictions based on those data). Determining if you’re working with nominal, ordinal, interval, or ratio data helps narrow down which technique to use. Conversely, determining what kind of analysis you wish to carry out (i.e. what your goal is) will tell you which type of data measurement you need to take.
2. What is ratio data? A definition
Ratio data is a form of quantitative (numeric) data. It measures variables on a continuous scale, with an equal distance between adjacent values. While it shares these features with interval data (another type of quantitative data), a distinguishing property of ratio data is that it has a ‘true zero.’ In other words, a measure of zero on a ratio scale is absolute: ratio data can never have a negative value. This is important because it allows us to apply all the possible mathematical operations (addition, subtraction, multiplication, and division) when carrying out statistical analyses.
It’s worth noting that while ratio data must have a true zero, it does not necessarily require an endpoint. A ratio scale can have potentially infinite values or a finite endpoint. The only important distinguisher over interval data is the existence of a true zero.
Of the four levels of measurement, ratio data is the most complex—one step up in the hierarchy from interval data. This also makes it the most desirable type of data. Why? Well, in data analytics terms, this means it can be used to carry out the widest possible range of analyses, vastly improving our ability to test hypotheses and obtain accurate insights (presuming, of course, that we’ve chosen the right analytical test and executed it properly…but more on that later!)
Key characteristics of ratio data
Ratio data are measured using a continuous, equidistant scale that shows order, direction, and a precise difference in values.
Ratio data have a ‘true zero,’ i.e. zero represents an absence of the variable, and you cannot have negative values.
Because ratio data lack negative values, they can be added, subtracted, multiplied, and divided (unlike the other three types of data).
Ratio data can be used to calculate measures including frequency distribution; mode, median, and mean; range, standard deviation, variance, and coefficient of variation.
What’s the difference between ratio data and interval data?
Both ratio and interval data are types of numerical data. The key difference is that ratio data has a true zero, while interval data does not. So, if your data are numerical, contain no negative numbers, and a measure of zero is equivalent to an absence of the chosen variable, you are dealing with ratio data. This difference is not trivial. By incorporating negative numbers, interval data prevents us from carrying out key mathematical functions, i.e. multiplication and division.
To illustrate, if we are measuring distance (ratio data) then we could say that 40 miles are double the value of 20 miles. However, if we are measuring temperature in Celsius (interval data) we cannot say that 40 degrees is double the value of 20 degrees since a measure of zero (rather than being the absence of temperature) is simply another measurement with an inherent value. This limits interval data’s usefulness. Ratio data is always the preferable option if you can get your hands on it!
3. What are some examples of ratio data?
Now we have an idea of what ratio data is, what are some examples? Let’s take a look.
- Temperature in Kelvin (0, +10, +20, +30, +40, etc.)
- Height (5ft. 8in., 5ft. 9in., 5ft. 10in., 5ft. 11in., 6ft. 0in. etc.)
- Price of goods ($0, $5, $10, $15, $20, $30, etc.)
- Age in years (from zero to 100+)
- Distance (from zero miles/km upwards)
- Time intervals (might include race times or the number of hours spent watching Netflix!)
As you can see, ratio data is all about measuring continuous variables on equidistant scales.
It’s important to note that while values in a ratio dataset must be capable of reaching true zero, this is not the same thing as actually having values that godown to zero. To illustrate, if you’re measuring the heights of a group of adults, you probably won’t obtain many measurements below 5 feet. The existence of true zero simply means that the measurement scale you are using has a definitive starting point of zero, i.e. you could reach zero in theory, even if not in practice.
Next, let’s see how ratio data is typically collected and used in everyday life.
4. How is ratio data collected and what is it used for?
There are many ways to collect ratio data. The chosen method depends on the nature of what you are measuring and how you intend to use the data. Common methods for collecting ratio data include surveys, questionnaires, or interviews. A familiar type of question might be:
- Question: How much time do you spend on social media per day? Possible answers: 0-1 hours, 1-2 hours, 2-3 hours, 3-4 hours, 4-5 hours.
Note that, in this example, the distance between intervals is always equal and there is a true zero, i.e. you cannot spend -2 hours a day on social media. Plus, if your scale lacks equal distance between measures, you are not collecting ratio data, but ordinal data.
Like interval data, ratio data are sometimes collected through direct observation, too. For instance, a zoologist might measure the heights of various elephants. To drive the point home, note once again that height measurements have a true zero, i.e. an elephant with a height of zero is an absence of an elephant.
Another common way of collecting ratio data is through automated data collection. For instance, most vehicles have software that tracks their speed and distance over time. Collecting and documenting this information regularly and automatically is beneficial. It allows for direct comparison between past and present data over periods that would be impractical to measure through direct observation.
Finally, it’s helpful to remember that, as a general rule of thumb, most quantitative data is ratio data. This is because most numerical measurements use a true zero scale.
What is ratio data used for?
Because ratio data incorporates the cumulative characteristics of data from all the levels of measurement (i.e. nominal, ordinal, and interval) it can be used for any type of data analysis you can think of. This kind of makes ratio data the holy grail of measurement scales! It can be used for everything from measuring customer behaviors to predicting future sales trends, and improving health outcomes…we could go on, but the list is pretty much endless.
5. How to analyze ratio data
In all cases, ratio data is the best type of data to work with. This is because it allows you to apply the entire arsenal of different statistical techniques. Even in the case of summary statistics—the most fundamental type of measurement—it allows you to scrutinize data at a deeper level than is possible for nominal, ordinal, and interval data.
The two main types of statistical analysis are descriptive and inferential statistics. Descriptive statistics summarize a dataset’s characteristics. Inferential statistics allow you to test hypotheses or make predictions. Let’s look at each more closely, in relation to ratio data.
Descriptive statistics for ratio data
Descriptive statistics you can obtain using ratio data include:
- Frequency distribution
- Central tendency: Mode, median, and mean
- Variability: Range, standard deviation, variance, and coefficient of variation
Almost all of these statistics can also be measured using interval data. The only exception is the coefficient of variation. For more detail on how you might obtain each of these measures, check out section five of our post on interval data, which uses more explicit examples.
As the name suggests, frequency distribution explores how a dataset’s values are distributed. The most common way to measure frequency distribution is to represent your data using a pivot table or some kind of graph. For example, the bar graph here shows the distribution of weight in a sample of marlins.
A bar plot showing the estimated weight of marlins. The x-axis shows weight, the y-axis shows frequency. Source: John C. Holdsworth / ResearchGate
Remember: while you can measure frequency distribution for many types of data, ratio data it must have a true zero, as does a measure of weight (on any scale).
Measures of central tendency: Mode, median, mean
Just like interval data, it’s possible to determine the three measures of central tendency using ratio data. These are:
- The mode (the value that’s repeated most often throughout the data)
- The median (the central value in the dataset)
- The mean (the dataset’s average value)
The measures of central tendency are useful summary statistics for judging the relative positions and importance of different values within a dataset. For example, we can use these measures to determine whether a value falls below or above the mean, how far from the mean it sits, what this implies, and so on. This is all beneficial when you are first dealing with a new set of data since it helps determine the best way to analyze it in more depth.
Measures of variability: Range, standard deviation, variance, and coefficient of variation
Variability is a term used to describe a collection of different measures. Range, standard deviation, and variance are all measures of variability that you can extract from ratio and interval data. However, using ratio data you can also calculate what’s known as the coefficient of variation. But what do all these measures tell us?
- Range: Describes the difference between the smallest and largest value.
- Standard deviation: Measures the amount of variation, or dispersion, in a set of values.
- Variance: Measures to what extent the values in a dataset vary from the mean.
- Coefficient of variation: Measures the ratio between the standard deviation and the mean. It’s usually expressed as a percentage. The higher the number, the greater the degree of dispersion around the mean. It’s a complex concept, but you can learn how to determine the coefficient of variation in this guide.
Inferential statistics for ratio data
When it comes to in-depth statistical analyses, you can analyze ratio data using the same techniques that you would use to analyze interval data. Ideally, you should apply parametric over non-parametric techniques. That’s because parametric techniques are uniquely suited to quantitative data (which has clearly defined parameters). Parametric tests offer a deeper level of insight than non-parametric tests. While you can still apply these to ratio data, they will not make the most of a ratio dataset’s full range of characteristics.
Here are some statistical tests you can use on ratio data:
- Analysis of variance (ANOVA)
- Pearson correlation coefficient
- Simple linear regression
T-tests help you identify whether or not a significant statistical variation exists between the mean value of two separate data samples. For instance, is there a difference in average heights between adults who weigh less than 180 pounds and those who weigh more than 180 pounds? If you want to test your hypothesis, the t-test is very useful. While there’s a range of different versions, in general, all you need is the average difference between values, the standard deviation, and the total of values from each sample.
Analysis of variance
You might use analysis of variance (ANOVA) to evaluate the mean values across three or more data samples. How do they compare? For instance, is there a difference in average heights between adults who weigh between 150 and 180 pounds, 180 and 210 pounds, and 210 and 240 pounds? ANOVA provides a similar outcome to a t-test but is useful when there are more than two independent variables.
Pearson correlation coefficient
You can use the Pearson correlation coefficient (also known as Pearson’s r) to measure the extent of linear correlation between two sets of variables. For instance, can you identify a relationship between someone’s weight and the amount they spend on weekly groceries? By plotting quantitative variables on a graph, you can determine the direction and strength of correlation between the different variables. When calculating Pearson’s r, values always fall between 1 and -1. 1 indicates a strong positive correlation, while -1 indicates a strong negative correlation. A value of 0 shows no correlation between variables.
Simple linear regression
You can use simple linear regression to identify the relationship between two variables. One of these will be the dependent variable, which is impacted by the independent variable. It is commonly used in predictive analytics. For instance, can a person’s height be used to predict their weight? Simple linear regression uses only two variables but variations, such as multiple linear regression, measure a dependent variable based on two or more independent variables.
This is just a small sample of the parametric tests you can use on ratio data. The full selection is wide and includes variations on those already described, from alternative regression tests (like logistic regression) to other comparative tests (such as the paired t-test or multiple analysis of variance, or MANOVA). While these methods require some getting used to, by now you hopefully have an idea of the kinds of analyses you can carry out.
6. Summary and further reading
In this post, we’ve:
- Introduced the four levels of measurement: Nominal, ordinal, interval, and ratio.
- Defined ratio data as a type of quantitative data that measures variables using continuous, equidistant numerical values on a scale with a true zero.
- Explained the difference between ratio and interval data: Both are types of numerical data. However, only ratio data has a true zero, allowing us to apply all possible mathematical operations (addition, subtraction, multiplication, and division) when carrying out an analysis.
- Shared some examples of ratio data: Temperature in Kelvin, height, distance, age in years.
- Highlighted the descriptive statistics you can obtain using ratio data: Frequency distribution, measures of central tendency (mode, median, and mean), and variability (range, standard deviation, variance, and coefficient of variation).
- Introduced some parametric tests for analyzing ratio data, e.g. Pearson correlation coefficient and linear regression.
Statistics is a vital part of data analytics. If you found this post helpful, we encourage you to explore the topic further. Start by diving deeper into data analytics with our free-five day data analytics short course, or read the following posts to learn more: