The impact of COVID-19 accelerated digitization and has led many businesses to invest in the power of data analytics.
By analyzing their data, organizations can streamline their efforts across departments by systematically highlighting and solving process flaws. It also helps them understand their target audience and offer a superior customer experience.
Various simulations and algorithms—powered by machine learning—are used to analyze data depending on the sample size, parameters, variables, required conclusion, etc. The Monte Carlo simulation is one such data analysis method that is especially useful for making predictions about complex systems where one or more variables are unknown. Simply put, it helps find viable solutions for complex, ambiguous problems.
In this article, we’ll explain the Monte Carlo simulation in detail. We will also have a brief look at its specific properties, applications, and methodologies. In all, the article will cover:
- What is the Monte Carlo method?
- How does the Monte Carlo method work?
- How to use Monte Carlo methods
- Monte Carlo methods and machine learning (with examples)
- Running Monte Carlo simulations in Python
- Key takeaways
If you’d like to skip ahead, simply use the clickable menu. Now, let’s begin!
1. What is the Monte Carlo method?
The Monte Carlo method is a data analysis technique used in cases where there’s an intervention of random variables. It was invented during the second World War to improve decision-making under highly uncertain conditions. A Monte Carlo simulation is named as such after the famous casino district of Monaco, because the element of ‘luck’ or ‘chance’ is inherent to the modeling approach here. Monte Carlo simulations use multiple values to replace uncertain variables, instead of just replacing them with a simple average—a ‘soft’ analysis method that doesn’t quite give accurate results.
Businesses often deal with uncertain variables that can impact important outcomes. The Monte Carlo simulation can be used to mitigate risks by predicting the likelihood of these outcomes.
Artificial intelligence, stock prices, sales forecasting, project management, and pricing are just a few of Monte Carlo simulations’ many functions. Monte Carlo simulations can also be used to conduct sensitivity analysis and calculate the correlation of inputs.
But how do Monte Carlo simulations work? Let’s find out.
2. How does the Monte Carlo method work?
Monte Carlo simulations come up with a set of varying outcomes corresponding to an estimated range of values, as opposed to using fixed input values. These outcomes are derived from one or more probability distributions based on one or more uncertain variables.
To run a Monte Carlo simulation, values are sampled at random from the input probability distribution. These samples are known as iterations. Once an iteration is done, it recalculates the results for a different range of values—between the minimum and maximum values obtained from the first iteration. This can be repeated thousands of times to come up with a large number of likely outcomes.
As the number of input variables increases, the number of probable outcomes increases as well. This allows you to make forecasts further out in time with more accuracy. This is why Monte Carlo simulations are often used to make long-term predictions.
Let’s understand how the Monte Carlo method generates outcomes based on a simple example. Assume you have a weighted dice. However, you don’t know which is the heavier side. Manually finding the odds of a specific number coming with the face side up is difficult. Finding an accurate answer? Close to impossible.
The Monte Carlo method simply simulates rolling this dice 10,000 times (or more) and uses the results to make very accurate predictions for what’s mostly likely to happen when the die is rolled. The higher the number of simulations, the higher the accuracy.
3. How to use Monte Carlo methods
Like any other data analysis technique, the Monte Carlo simulation can be performed using any appropriate tool such as machine learning applications like TensorFlow and PyTorch, or, for smaller sample sets, even Excel. However, regardless of the software, all Monte Carlo simulations involve three basic mathematical steps. These are:
- Setting up the model. You will need to identify the dependent variable (say, the probability of a number X coming up in our example) and the independent variables viz, input/risk, or predictor variables. These are the predefined variables, which may be number of sides on the dice, number of weighted sides, number of times the dice is rolled, and so on.
- Probability distribution. In this step, you need to specify the probability distribution for the independent variables. The type of distribution (normal, uniform, or binomial), range of values, and weights assigned to them are all decisions you can take based on historical data, general context, and your subjective judgment.
- Running iterative simulations. Run simulations until you generate enough possible values for the independent variable. Ideally, this should be continued until your dataset starts to resemble the ideal sample set—pne with a large number of values, none of which deviate too much from the expected values.
While modifying the parameters for subsequent iterations, one thing to keep in mind is that it is generally better to have small variances. If the deviances/standard deviation values start getting too large, you might want to tweak the parameters further.
Variances are statistical values that measure the spread between the numbers in a dataset. Small variance values (like 35/12, which is the variance for rolling a fair die) suggest that the numbers in your set are closer together and that your simulation is operating within a small, accurate range.
4. Monte Carlo methods and machine learning (with examples)
As the name suggests, the Monte Carlo method is a simulation method. While machine learning can be used to run data simulations, Monte Carlo simulations differ from usual machine learning programs. An average machine learning algorithm is data-centric and focusses more on exploration (for example, finding patterns in consumer purchases). On the other hand, Monte Carlo simulations are process-centric and focus entirely on making predictions (for example, finding out how a particular marketing decision might affect a demographic).
As a rule of thumb, it would be a good idea to use simulations instead of machine learning algorithms if:
- Your data is contained in a proper data warehouse.
- You have enough data to build extensive machine learning simulation models.
- Prediction is more important than general exploration for your task.
A few common examples of a Monte Carlo simulation are:
- Sampling from a uniform distribution of the set {1,2,3,4,5,6} to simulate rolling an unweighted die.
- Using historical data to predict the move of an opponent in a game of chess.
- Using historical data or scientific input to predict the probability of rainfall within the next month.
Finally, consider defining a Bernoulli distribution for flipping a coin. Now, if you sample your calculations from this distribution, you are essentially performing a Monte Carlo simulation!
5. Running Monte Carlo simulations in Python
Using pandas to construct a Python model that simulates a spreadsheet is one of the easiest and most efficient ways of running Monte Carlo simulations in Python. Here’s how you go about it.
Task: Predicting the sales commission budget for next year.
A table has been created, which represents the sales targets given to sales reps this year, their actual sales for the year and the commission rate they charge. This information will be used to predict the sales commission these reps are expected to make next year.
Step 1: Import your libraries and set up plotting styles.
This step involves writing simple commands like ‘import pandas as pd’ that direct your Python program to import the pre-programmed libraries needed for the simulation.
Step 2: Define independent variables
This step involves declaring the predefined variables and assigning them appropriate values.
Step 3: Use NumPy to generate a list of percentages similar to our original normal distribution
This step involves running a line of code that directs Python to generate a random list of percentage values that is similar to our initial ‘% commission rate’ values.
Step 4: Create sample set
This is the sample set created by executing the previous step.
Step 5: Build a uniform distribution with manually selected probability rates (using NumPy random choice)
This step involves manually feeding some values for the independent (predefined) variable in order to achieve a normal distribution.
Step 6: Build up a pandas DataFrame
A pandas DataFrame is the Excel sheet or table that will store the result values.
The following image shows what the new dataframe will look like.
Step 7: Map Pct_To_Target to the commission rate
In this step, we’re manually writing the formula that will define the commission rates based on the sales made and other predefined values.
Step 8: Create commission rate values and use it to estimate the amount
In this step, we are using the commission rate values obtained from the previous step to calculate the commission amount. You can repeat this multiple times to get more accurate predictions.
This yields the following result, which looks very much like an Excel model we might build:
6. Key Takeaways
So, there you have it! In this article, we’ve covered the basics of Monte Carlo simulations, and how to apply them to the world of data analytics. Here were the main talking points:
- The Monte Carlo method is a data analysis method used to solve complex problems where one or more variables are unknown.
- It is an umbrella term dating back to the second World War, that refers to simulations that help make very accurate predictions.
- Sampling from a Bernoulli, uniform, normal or binomial distribution to find the likeliness of an event are all common examples of the Monte Carlo simulation.
- Machine learning programs allow users to run elaborate Monte Carlo simulations as coded data-processing algorithms.
- The pandas library in Python can be used to make simple, spreadsheet-like models. This is the easiest way to run a machine learning-based Monte Carlo simulation.
Interested in other aspects of the data analytics process? To learn more about data analytics, check out this free, 5-day data analytics short course, or read the following posts for more introductory topics: