{"id":3763,"date":"2021-04-13T01:29:00","date_gmt":"2021-04-12T23:29:00","guid":{"rendered":"https:\/\/careerfoundry.inbearbeitung.de\/en\/blog\/uncategorized\/inferential-vs-descriptive-statistics\/"},"modified":"2023-05-24T18:49:28","modified_gmt":"2023-05-24T16:49:28","slug":"inferential-vs-descriptive-statistics","status":"publish","type":"post","link":"https:\/\/careerfoundry.inbearbeitung.de\/en\/blog\/data-analytics\/inferential-vs-descriptive-statistics\/","title":{"rendered":"What\u2019s the Difference Between Descriptive and Inferential Statistics?"},"content":{"rendered":"<p><strong>All statistical techniques can be divided into two broad categories: descriptive and inferential statistics. In this post, we explore the differences between the two, and how they impact the field of data analytics.<\/strong><\/p>\n<p>Statistics. They are the heart of data analytics. They help us spot trends and patterns. They help us plan. In essence, they breathe life into data and help us derive meaning from it.<\/p>\n<p>While the individual statistical methods we use in data analytics are too numerous to count, they can be broadly divided into two main camps: <a href=\"https:\/\/careerfoundry.inbearbeitung.de\/en\/blog\/data-analytics\/descriptive-analytics\/\">descriptive statistics<\/a> and inferential statistics. In this post, we explore the difference between descriptive and inferential statistics, and touch on how they\u2019re used in data analytics. We\u2019ll break things down into the following bite-sized chunks:<\/p>\n<ol>\n<li><a href=\"#what-is-statistics\">What is statistics?<\/a><\/li>\n<li><a href=\"#in-summary-whats-the-difference-between-inferential-and-descriptive-statistics\">What\u2019s the difference between inferential and descriptive statistics?<\/a><\/li>\n<li><a href=\"#what-are-population-and-sample-in-statistics\">Must know: What are population and sample?<\/a><\/li>\n<li><a href=\"#what-is-descriptive-statistics\">What is descriptive statistics?<\/a><\/li>\n<li><a href=\"#what-is-inferential-statistics\">What is inferential statistics?<\/a><\/li>\n<li><a href=\"#inferential-vs-descriptive-statistics-FAQ\">Inferential vs descriptive statistics FAQs<\/a><\/li>\n<\/ol>\n<p>Ready? Engage!<\/p>\n<h2 id=\"what-is-statistics\">1. What is statistics?<\/h2>\n<p>Put simply, statistics is the area of applied math that deals with the collection, organization, analysis, interpretation, and presentation of data.<\/p>\n<p>Sound familiar? It should. These are all vital steps in the <a href=\"https:\/\/careerfoundry.inbearbeitung.de\/en\/blog\/data-analytics\/the-data-analysis-process-step-by-step\/\">data analytics process<\/a>. In fact, in many ways, data analytics <em>is<\/em> statistics. When we use the term \u2018data analytics\u2019 what we really mean is \u2018the statistical analysis of a given dataset or datasets\u2019. But that\u2019s a bit of a mouthful, so we tend to shorten it!<\/p>\n<p>Since they are so fundamental to data analytics, statistics are also vitally important to any field that data analysts work in. From science and psychology to marketing and medicine, the wide range of statistical techniques out there can be broadly divided into two categories: descriptive statistics and inferential statistics. But what\u2019s the difference between them?<\/p>\n<p>In a nutshell, <strong>descriptive statistics focus on describing the visible characteristics of a dataset<\/strong> (a population or sample).<\/p>\n<p>Meanwhile, i<strong>nferential statistics focus on making predictions or generalizations about a larger dataset, based on a sample<\/strong> of those data.<\/p>\n<h2 id=\"in-summary-whats-the-difference-between-inferential-and-descriptive-statistics\">2. What\u2019s the difference between inferential and descriptive statistics?<\/h2>\n<p>Let&#8217;s look at an overview of the differences between these two categories:<\/p>\n<p><strong>Descriptive statistics:<\/strong><\/p>\n<ul>\n<li>Describe the features of populations and\/or samples<\/li>\n<li>Organize and present data in a purely factual way<\/li>\n<li>Present final results visually, using tables, charts, or graphs<\/li>\n<li>Draw conclusions based on known data<\/li>\n<li>Use measures like central tendency, distribution, and variance<\/li>\n<\/ul>\n<p><strong>Inferential statistics:<\/strong><\/p>\n<ul>\n<li>Use samples to make generalizations about larger populations<\/li>\n<li>Help us to make estimates and predict future outcomes<\/li>\n<li>Present final results in the form of probabilities<\/li>\n<li>Draw conclusions that go beyond the available data<\/li>\n<li>Use techniques like hypothesis testing, confidence intervals, and regression and correlation analysis<\/li>\n<\/ul>\n<p><strong>Important note:<\/strong> while we\u2019re presenting descriptive and inferential statistics in a binary way, they are really most often used in conjunction.<\/p>\n<h2 id=\"what-are-population-and-sample-in-statistics\">3. What are population and sample in statistics?<\/h2>\n<p>Before we go explore these two categories of statistics further, it helps to understand the vital concepts of what population and sample mean. We can define them as follows:<\/p>\n<h3>Population<\/h3>\n<p>This the entire group that you wish to draw data from (and subsequently draw conclusions about).<\/p>\n<p>While in day-to-day life, the word is often used to describe groups of people (such as the population of a country) in statistics, it can apply to any group from which you will collect information. This is often people, but it could also be cities of the world, animals, objects, plants, colors, and so on.<\/p>\n<h3>A sample<\/h3>\n<p>This a representative group of a larger population. Random sampling from representative groups allows us to draw broad conclusions about an overall population.<\/p>\n<p>This approach is commonly used in polling. Pollsters ask a small group of people about their views on certain topics. They can then use this information to make informed judgments about what the larger population thinks. This saves time, hassle, and the expense of extracting data from an entire population (which for all practical purposes is usually impossible).<\/p>\n<p><img decoding=\"async\" title=\"A diagram depicting how a random sample of data is selected from a population\" src=\"\/en\/wp-content\/uploads\/old-blog-uploads\/population-vs-sample.png\" alt=\"A diagram depicting how a random sample of data is selected from a population\" \/><\/p>\n<p><em>Attribution:\u00a0<a href=\"https:\/\/creativecommons.org\/licenses\/by-sa\/4.0\" rel=\"noopener\">Dan Kernler, CC BY-SA 4.0<\/a>, via Wikimedia Commons<\/em><\/p>\n<p>The image illustrates the concept of population and sample. Using random sample measurements from a representative group, we can estimate, predict, or infer characteristics about the larger population. While there are many technical variations on this technique, they all follow the same underlying principles.<\/p>\n<p>OK! Now we understand the concepts of population and sample, we\u2019re ready to explore descriptive and inferential statistics in a bit more detail.<\/p>\n<h2 id=\"what-is-descriptive-statistics\">4. What is descriptive statistics?<\/h2>\n<p>Descriptive statistics are used to describe the characteristics or features of a dataset. The term &#8220;descriptive statistics&#8221; can be used to describe both individual quantitative observations (also known as &#8220;summary statistics&#8221;) as well as the overall process of obtaining insights from these data.<\/p>\n<p>We can use descriptive statistics to describe both an entire population or an individual sample. Because they are merely explanatory, descriptive statistics are not heavily concerned with the differences between the two types of data.<\/p>\n<p>So what measures do descriptive statistics look at? While there are many, important ones include:<\/p>\n<ul>\n<li>Distribution<\/li>\n<li>Central tendency<\/li>\n<li>Variability<\/li>\n<\/ul>\n<p>Let\u2019s briefly look at each of these now.<\/p>\n<h3 id=\"what-is-distribution\">What is distribution?<\/h3>\n<p>Distribution shows us the frequency of different outcomes (or data points) in a population or sample. We can show it as numbers in a list or table, or we can represent it graphically. As a basic example, the following list shows the number of those with different hair colors in a dataset of 286 people.<\/p>\n<ul>\n<li>Brown hair: 130<\/li>\n<li>Black hair: 39<\/li>\n<li>Blond hair: 91<\/li>\n<li>Auburn hair: 13<\/li>\n<li>Gray hair: 13<\/li>\n<\/ul>\n<p>We can also represent this information visually, for instance in a pie chart.<\/p>\n<p><img decoding=\"async\" title=\"A pie chart depicting what portion of a sample have certain hair colors\" src=\"\/en\/wp-content\/uploads\/old-blog-uploads\/hair-color-pie-chart.png\" alt=\"A pie chart depicting what portion of a sample have certain hair colors\" \/><\/p>\n<p>Generally, using visualizations is common practice in descriptive statistics. It helps us more readily spot patterns or trends in a dataset.<\/p>\n<h3 id=\"what-is-central-tendency\">What is central tendency?<\/h3>\n<p>Central tendency is the name for measurements that look at the typical central values within a dataset. This does not just refer to the central value within an entire dataset, which is called the median. Rather, it is a general term used to describe a variety of central measurements. For instance, it might include central measurements from different quartiles of a larger dataset. Common measures of central tendency include:<\/p>\n<ul>\n<li><strong>The mean:<\/strong> The average value of all the data points.<\/li>\n<li><strong>The median:<\/strong> The central or middle value in the dataset.<\/li>\n<li><strong>The mode:<\/strong> The value that appears most often in the dataset.<\/li>\n<\/ul>\n<p>Once again, using our hair color example, we can determine that the mean measurement is 57.2 (the total value of all the measurements, divided by the number of values), the median is 39 (the central value) and the mode is 13 (because it appears twice, which is more than any of the other data points).<\/p>\n<p>Although this is a heavily simplified example, for many areas of data analysis these core measures underpin how we summarize the features of a data sample or population. Summarizing these kinds of statistics is the first step in determining other key characteristics of a dataset, for example, its variability. This leads us to our next point\u2026<\/p>\n<h3 id=\"what-is-variability\">What is variability?<\/h3>\n<p>The variability, or dispersion, of a dataset, describes how values are distributed or spread out. Identifying variability relies on understanding the central tendency measurements of a dataset. However, like central tendency, variability is not just one measure. It is a term used to describe a range of measurements. Common measures of variability include:<\/p>\n<ul>\n<li><strong>Standard deviation:<\/strong> This shows us the amount of variation or dispersion. Low standard deviation implies that most values are close to the mean. High standard deviation suggests that the values are more broadly spread out.<\/li>\n<li><strong>Minimum and maximum values:<\/strong> These are the highest and lowest values in a dataset or quartile. Using the example of our hair color dataset again, the minimum and maximum values are 13 and 130 respectively.<\/li>\n<li><strong>Range:<\/strong> This measures the size of the distribution of values. This can be easily determined by subtracting the smallest value from the largest. So, in our hair color dataset, the range is 117 (130 minus 13).<\/li>\n<li><strong>Kurtosis:<\/strong> This measures whether or not the tails of a given distribution contain extreme values (also known as outliers). If a tail lacks outliers, we can say that it has low kurtosis. If a dataset has a lot of outliers, we can say it has high kurtosis.<\/li>\n<li><strong>Skewness:<\/strong> This is a measure of a dataset\u2019s symmetry. If you were to plot a bell-curve and the right-hand tail was longer and fatter, we would call this positive skewness. If the left-hand tail is longer and fatter, we call this negative skewness. This is visible in the following image.<\/li>\n<\/ul>\n<p><img decoding=\"async\" title=\"Two simple graphs showing positive skew and negative skew in the data\" src=\"\/en\/wp-content\/uploads\/old-blog-uploads\/negative-and-positive-skew-diagrams.png\" alt=\"Two simple graphs showing positive skew and negative skew in the data\" \/><\/p>\n<p><em>Attribution:\u00a0<a href=\"https:\/\/creativecommons.org\/licenses\/by-sa\/3.0\" rel=\"noopener\">Rodolfo Hermans (Godot) at en.wikipedia., CC BY-SA 3.0<\/a>, via Wikimedia Commons<\/em><\/p>\n<p>Used together, distribution, central tendency, and variability can tell us a surprising amount of detailed information about a dataset. Within data analytics, they are very common measures, especially in the area of <a href=\"https:\/\/careerfoundry.inbearbeitung.de\/en\/blog\/data-analytics\/exploratory-data-analysis\/\">exploratory data analysis<\/a>. Once you\u2019ve summarized the main features of a population or sample, you\u2019re in a much better position to know how to proceed with it. And this is where inferential statistics come in.<\/p>\n<p><strong>Want to try your hand at calculating descriptive statistics?\u00a0<\/strong>In this <a href=\"https:\/\/careerfoundry.inbearbeitung.de\/en\/tutorials\/data-analytics-for-beginners\/descriptive-statistics-and-exploratory-data-analysis\/\">free data analytics tutorial<\/a>, we show you, step by step, how to calculate the mean, median, mode, and frequency for certain variables in a real dataset as part of exploratory data analysis. Give it a go!<\/p>\n<p>So, we\u2019ve established that descriptive statistics focus on summarizing the key features of a dataset. But what about inferential ones?<\/p>\n<h2 id=\"what-is-inferential-statistics\">5. What is inferential statistics?<\/h2>\n<p>Inferential statistics focus on making generalizations about a larger population based on a representative sample of that population. Because inferential statistics focuses on making predictions (rather than stating facts) its results are usually in the form of a probability.<\/p>\n<p>Unsurprisingly, the accuracy of inferential statistics relies heavily on the sample data being both accurate and representative of the larger population. To do this involves obtaining a random sample. If you\u2019ve ever read news coverage of scientific studies, you\u2019ll have come across the term before. The implication is always that random sampling means better results.<\/p>\n<p>On the flipside, results that are based on biased or non-random samples are usually thrown out. Random sampling is very important for carrying out inferential techniques, but it is not always straightforward!<\/p>\n<p>Let\u2019s quickly summarize how you might obtain a random sample.<\/p>\n<h3 id=\"how-do-we-obtain-a-random-sample\">How do we obtain a random sample?<\/h3>\n<p>Random sampling can be a complex process and often depends on the particular characteristics of a population. However, the fundamental principles involve:<\/p>\n<h4 id=\"defining-a-population\">1. Defining a population<\/h4>\n<p>This simply means determining the pool from which you will draw your sample. As we explained earlier, a population can be anything\u2014it isn\u2019t limited to people. So it could be a population of objects, cities, cats, pugs, or anything else from which we can derive measurements!<\/p>\n<h4 id=\"deciding-your-sample-size\">2. Deciding your sample size<\/h4>\n<p>The bigger your sample size, the more representative it will be of the overall population. Drawing large samples can be time-consuming, difficult, and expensive. Indeed, this is why we draw samples in the first place\u2014it is rarely feasible to draw data from an entire population. Your sample size should therefore be large enough to give you confidence in your results but not so small that the data risk being unrepresentative (which is just shorthand for inaccurate). This is where using descriptive statistics can help, as they allow us to strike a balance between size and accuracy.<\/p>\n<h4 id=\"randomly-select-a-sample\">3. Randomly select a sample<\/h4>\n<p>Once you\u2019ve determined the sample size, you can draw a random selection. You might do this using a random number generator, assigning each value a number and selecting the numbers at random. Or you could do it using a range of similar techniques or algorithms (we won\u2019t go into detail here, as this is a topic in its own right, but you get the idea).<\/p>\n<h4 id=\"analyze-the-data-sample\">4. Analyze the data sample<\/h4>\n<p>Once you have a random sample, you can use it to infer information about the larger population. It\u2019s important to note that while a random sample is <em>representative<\/em> of a population, it will never be 100% accurate. For instance, the mean (or average) of a sample will rarely match the mean of the full population, but it will give you a good idea of it. For this reason, it\u2019s important to incorporate your error margin in any analysis (which we cover in a moment). This is why, as explained earlier, any result from inferential techniques is in the form of a probability.<\/p>\n<p>However, presuming we\u2019ve obtained a random sample, there are many inferential techniques for analyzing and obtaining insights from those data. The list is long, but some techniques worthy of note include:<\/p>\n<ul>\n<li>Hypothesis testing<\/li>\n<li>Confidence intervals<\/li>\n<li>Regression and correlation analysis<\/li>\n<\/ul>\n<p>Let\u2019s explore a bit more closely.<\/p>\n<h3 id=\"what-is-hypothesis-testing\">What is hypothesis testing?<\/h3>\n<p>Hypothesis testing involves checking that your samples repeat the results of your hypothesis (or proposed explanation). The aim is to rule out the possibility that a given result has occurred by chance. A topical example of this is the clinical trials for the Covid-19 vaccine. Since it\u2019s impossible to carry out trials on an entire population, we carry out numerous trials on several random, representative samples instead.<\/p>\n<p>The hypothesis test, in this case, might ask something like: \u2018Does the vaccine reduce severe illness caused by covid-19?\u2019 By collecting data from different sample groups, we can infer if the vaccine will be effective.<\/p>\n<p>If all samples show similar results and we know that they are representative and random, we can generalize that the vaccine will have the same effect on the population at large. On the flip side, if one sample shows higher or lower efficacy than the others, we must investigate why this might be. For instance, maybe there was a mistake in the sampling process, or perhaps the vaccine was delivered differently to that group.<\/p>\n<p>In fact, it was due to a dosing error that one of the Covid vaccines actually <a href=\"https:\/\/euronews.com\/2020\/11\/26\/astrazeneca-reveals-dosing-mistake-in-coronavirus-vaccine-trials\" target=\"_blank\" rel=\"noopener\">proved to be more effective than other groups in the trial\u2026<\/a> Which shows how important hypothesis testing can be. If the outlier group had simply been written off, the vaccine would have been less effective!<\/p>\n<h3 id=\"what-is-a-confidence-interval\">What is a confidence interval?<\/h3>\n<p>Confidence intervals are used to estimate certain parameters for a measurement of a population (such as the mean) based on sample data. Rather than providing a single mean value, the confidence interval provides a range of values. This is often given as a percentage. If you\u2019ve ever read a scientific research paper, conclusions drawn from a sample will always be accompanied by a confidence interval.<\/p>\n<p>For example, let\u2019s say you\u2019ve measured the tails of 40 randomly selected cats. You get a mean length of 17.5cm. You also know the standard deviation of tail lengths is 2cm. Using a special formula, we can say the mean length of tails in the full population of cats is 17.5cm, with a 95% confidence interval. Essentially, this tells us that we are 95% certain that the population mean (which we cannot know without measuring the full population) falls within the given range. This technique is very helpful for measuring the degree of accuracy within a sampling method.<\/p>\n<h3 id=\"what-are-regression-and-correlation-analysis\">What are regression and correlation analysis?<\/h3>\n<p>Regression and correlation analysis are both techniques used for observing how two (or more) sets of variables relate to one another.<\/p>\n<p>Regression analysis aims to determine how one dependent (or output) variable is impacted by one or more independent (or input) variables. It\u2019s often used for hypothesis testing and predictive analytics. For example, to predict future sales of sunscreen (an output variable) you might compare last year\u2019s sales against weather data (which are both input variables) to see how much sales increased on sunny days.<\/p>\n<p>Correlation analysis, meanwhile, measures the degree of association between two or more datasets. Unlike regression analysis, correlation does not infer cause and effect. For instance, ice cream sales and sunburn are both likely to be higher on sunny days\u2014we can say that they are correlated. But it would be incorrect to say that ice cream causes sunburn! You can <a href=\"https:\/\/careerfoundry.inbearbeitung.de\/en\/blog\/data-analytics\/covariance-vs-correlation\/\" data-cke-saved-href=\"\/en\/blog\/data-analytics\/covariance-vs-correlation\/\">learn more about correlation (and how it differs from covariance) in this guide<\/a>.<\/p>\n<p>What we\u2019ve described here is just a small selection of a great many inferential techniques that you can use within data analytics. However, they provide a tantalizing taste of the sort of predictive power that inferential statistics can offer.<\/p>\n<h2 id=\"inferential-vs-descriptive-statistics-FAQ\">6. Descriptive vs inferential statistics FAQs<\/h2>\n<h3>What is an example of a descriptive statistic?<\/h3>\n<p>A good example would be a pie chart displaying the different hair colors in the population, clearly showing that brown hair is the most common.<\/p>\n<h3>Should I use descriptive or inferential statistics?<\/h3>\n<p>Which to use depends on the situation, as they have different goals. In general, descriptive statistics are easier to carry out and are generalizations, and inferential statistics are more useful if you need a prediction. So, it depends on the scenario and what you yourself are looking for.<\/p>\n<h3>What is an example of an inferential statistic?<\/h3>\n<p>This would be analyzing the hair color of one college class of students and using that result to predict the most popular hair color in the entire college.<\/p>\n<h2>7. Wrap-up<\/h2>\n<p>So there you have it, everything you need to know about descriptive vs inferential statistics! Although we examined them separately, they&#8217;re typically used at the same time. Together, these powerful statistical techniques are the foundational bedrock on which data analytics is built.<\/p>\n<p>To learn more about the role that descriptive and inferential statistics play in data analytics, check out our <strong><a href=\"https:\/\/careerfoundry.inbearbeitung.de\/en\/short-courses\/become-a-data-analyst\/\">free, 5-day short course<\/a>.<\/strong> If that&#8217;s piqued your interest in pursuing data analytics as a career, why not then check out the best online <a href=\"https:\/\/careerfoundry.inbearbeitung.de\/en\/blog\/data-analytics\/best-data-analytics-certification-programs\/\">data analytics courses<\/a> on the market? For more introductory data analytics topics, see the following:<\/p>\n<ul>\n<li><a href=\"https:\/\/careerfoundry.inbearbeitung.de\/en\/blog\/data-analytics\/multivariate-analysis\/\">An introduction to multivariate analysis<\/a><\/li>\n<li><a href=\"https:\/\/careerfoundry.inbearbeitung.de\/en\/blog\/data-analytics\/regression-vs-classification\/\">What\u2019s the difference between classification and regression?<\/a><\/li>\n<li><a href=\"https:\/\/careerfoundry.inbearbeitung.de\/en\/blog\/data-analytics\/what-is-data-cleaning\/\">What is data cleaning and why does it matter?<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>All statistical techniques can be divided into two categories: descriptive and inferential statistics. In this post, we explore the differences between the two, and how they impact the field of data analytics.<\/p>\n","protected":false},"author":101,"featured_media":296,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_lmt_disableupdate":"yes","_lmt_disable":"","footnotes":""},"categories":[3],"tags":[],"class_list":["post-3763","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-analytics"],"acf":{"homepage_category_featured":false},"modified_by":"Matthew Deery","_links":{"self":[{"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/posts\/3763","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/users\/101"}],"replies":[{"embeddable":true,"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/comments?post=3763"}],"version-history":[{"count":5,"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/posts\/3763\/revisions"}],"predecessor-version":[{"id":26056,"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/posts\/3763\/revisions\/26056"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/media\/296"}],"wp:attachment":[{"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/media?parent=3763"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/categories?post=3763"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/tags?post=3763"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}