There are few fields within data analytics that are more exciting than spatial analysis.
Even if you haven’t yet worked on a project involving geodata, it’s likely that you’ve already interacted with applications that have deeply incorporated spatial analysis into their core functionality.
In recent years, spatial analytics has become more and more important with the rise of big data and the integration of the Internet of Things into everyday life. The growth of open-source and enterprise tools has also made it easier for beginners to incorporate spatial attributes into data science projects. While this all sounds great, it can be a bit of a challenge figuring out where to start learning more about spatial data.
In this handy introductory article, we’ll break down the complexity of spatial analysis and help you understand what it is, why it matters, how to perform your own spatial analysis, and take a look at real-world use cases to understand its potential.
If you’d like to get your hands into some data analytics, check out this free 5-day data short course.
Here’s what we’ll cover:
- What is spatial analysis?
- How does spatial analysis work?
- The spatial analysis process
- Examples of spatial analysis
- Summary and next steps
1. What is spatial analysis?
Spatial analysis, sometimes known as spatial data science, works on transforming geographical features into usable data points for quantitative analysis. This can include using characteristics like distance between places, location, boundaries, and networks in statistical analysis or machine learning.
Analysts who know how to use spatial analytical tools can enrich their analysis by making use of geospatial data, which are often accessible through open source tools like OpenStreetMap, and through private APIs such as Mapbox. Geospatial data includes things like longitude and latitude, satellite images, and zip codes.
As a subset of data science, spatial analysis features all the classic interlocking components of analytics. You can create complex models and simulations and produce visualizations that incorporate spatial data. This can broaden the value of your insights by seeing how patterns surface in a given space, whether it is as small as a county or as large as a continent.
Understanding how the use of and interaction with space in any setting is a key part of any analysis. For example, making use of spatial data can help triage resources in search and rescue operations or identify and predict the next best location on where to extend public transit based on projected population growth in a city. If you’ve used ride-sharing applications like Uber or Lyft, you’ve benefited from their engineers working to create algorithms that connect drivers to customers at a price determined by demand across a city, and help plan the most efficient routes to get you to your destination.
If you’re not yet convinced, think of the enormous and valuable datasets generated by smartphones tracking your health performance indicators as you move throughout the places you live in and travel to. It presents an unprecedented opportunity for analysts to bring new data to existing questions in fields like healthcare, science, business, and urban planning.
2. How does spatial analysis work?
Spatial analysis is most different from other kinds of analysis in the types of data it uses. If you’re new to this, you might be unfamiliar with the wide range of data types and formats that are used to capture geospatial attributes. These include vector, raster, shapefiles, geoTIFFS, geoJSONs,and more. If you’ve only worked with CSV files and Python dataframes, these might sound a bit intimidating to work with, but they’re really quite straightforward to understand and use.
We’ll start with how spatial data can be divided into two categories: geometric and geographic data. Geometric data is more straightforward to understand as it simply relates to a two-dimensional mapping system. You have encountered it if you’ve ever used applications like Apple or Google Maps. Geographic data is determined by its location on a sphere, like the planet we live on! You use the latitude and longitude to pinpoint a specific location or draw boundaries. We obtain the data through satellites and store these in global position systems (GPS), with many use cases.
How data is stored looks different in spatial analysis too. The two most common data formats you might encounter are vector and raster files. Vector files simply use points (coordinate pairs), lines (connected points), and polygons (connected lines). This transforms a location (points) into grouped boundaries (polygons) that can represent political entities (countries) or administrative jurisdictions (different school boards). Vector data is usually stored in shapefiles, which can be proprietary (e.g. Esri file geodatabase) or open source (e.g. GeoJSON).
Raster data is more commonly seen in satellite imagery or in geographical information systems (GIS). Each pixel within it holds a value that corresponds to geographic space. This value can be a unit of measurement, such as temperature, elevation, or population density. Hence, analysts can manipulate these variables to see variations through time and space.
3. The spatial analysis process
As spatial analysis involves industry-specific data formats and spatial tools, much of it is likely unfamiliar to the average data analyst unaccustomed to working with maps and mapping data. We’ll take you through each step in a typical spatial analytical workflow to demystify the process!
Data collection & transformation
As always, we begin with data collection. You likely have a question in mind, whether it’s in research or one tasked by a client to answer. If you don’t have access to spatial data, there are many open-source spatial datasets you can download to enrich your existing datasets. For example, you can look at the Natural Earth website for many public domain map vector and raster datasets, or the Earth Data repository with data compiled by NASA. You can also check out an article we wrote on the ten great places you can check out to find free and open datasets.
If you do have access to the data you need, you can use SQL to read and create dataframes for analysis. As a final step at this stage, do remember to carry out data preprocessing to clean your data, standardize units of analysis, merge multiple datasets, and input any missing data.
Data exploration & visualization
Next, it’s time to get a sense of what your data looks like. It’s especially important to do this with mapping data, where trends (or the absence of) are much more easily seen when you overlay a map with the attributes you’re interested in learning about. How you proceed next depends on what tools you have access to. If you have a license to ArcGIS Pro, an industry leader in GIS software, the user friendly interface makes it easy for beginners to get started with creating maps for visualization and exploration without advanced coding ability
For the rest of us who would like to learn more about geospatial analysis, it’s far more likely that we’ll need to learn how to use open-source Python packages like geo maps and datashader to manipulate and make sense of our spatial data.
These are just a few examples of the many fantastic tools available to Python users in the ecosystem that support vector or raster formats, big and small datasets, and whether you can deploy the visualization into an interactive application that your end users can play around with for their own analysis.
Data modeling & prediction
We now turn to every analysts’ favorite step in the typical data science workflow: building a model for prediction.
In spatial analysis, the focus is on modeling relationships between geographic locations to predict your target variable of choice, given the number of features in your model. These features can be both geographic and non-geographic features – in fact, you’re encouraged to use both as it’s a powerful way to make use of multi-dimensional analysis for better results.
There are a few things to take note of with conducting spatial analysis. It can be useful to gain a theoretical grounding to understand how to use spatial autocorrelation, account for heterogeneity of features over a geographic region, how to perform regression analysis with spatial features, and creating new variables by interacting geographic and non-geographic variables.
Learn more: What is data modeling?
4. Examples of spatial analysis
Now that we’ve had a solid understanding of the nuts and bolts of where to access spatial data and how to analyze different data formats, let’s turn to some real-work examples to see how powerful spatial tools can be when put to use in the industries we work in.
Competitive retail analysis
Imagine a new fast food chain looking to start expanding into more locations nationwide. This can involve considerable resources and financial risk.
How do they decide where the first franchise should be located? What methods should they use to determine between different cities? This is a hard problem, especially when we consider how the quality of a location can be especially important for a fast food chain which gets most of its revenue from being near dense population centers.
Here, spatial analysis can offer some insight and a way in which to think through the problem. We can construct a dataset with information about each city’s retail competitors, city gross domestic product, age distribution, median household income, and more. If this is a fast food chain that primarily sells hot food, we can imagine it might do better in colder regions, and we can add data on weather variation to capture this.
Once the model has been built and visualizations created, we can then gain a holistic picture of each city’s competitive profile as a whole, instead of narrowly focusing on non-spatial attributes and missing key factors that would drive a successful first launch.
Urban planning & design
Urban planners have always been the leaders in adopting geospatial frameworks to better understand and serve the cities they are responsible for making long term plans for. In fact, many advanced academic programs often combine an understanding of leading geospatial tools such as GIS together with a traditional curriculum on urban planning.
Oftentimes, urban planners are faced with competing options for how to develop or revitalize an economically-depressed region or area of a city. They try to find ways to determine which project has the best potential return on the city’s investment. This is where geospatial analysis shines: planners can use a dataset with historical information about the geography of the region’s economic sectors over time, enrich it with statistics such as population density, housing developments, weather, and household income.
They can then create a model to predict what will work in the future, for a certain plot of land, based on what has worked in the past. Presenting their visualization and conclusions through a methodologically sound approach can help persuade key political decision makers to make evidence-based decisions for a better outcome.
5. Summary and next steps
Spatial analysis offers so much potential in taking your analytics project to the next level.
If you’re interested in learning more about career pathways in data analytics, consider gaining an understanding of spatial analysis. As more and more companies embrace cutting edge techniques and tools, having spatial analytics in your data toolbelt will set you apart from the competition. To summarize the core concepts to get started on with spatial analysis, we recommend keeping these steps in mind when you embark on your first project:
- Data collection & transformation: A good analyst remembers the mantra: garbage in, garbage out. You’ll want to make sure your data was collected appropriately for the question you’ve been tasked to answer. Make use of the public domain datasets on offer as they can often enrich existing datasets you have, allowing you to include a spatial component for analysis.
- Data exploration & visualization: With mapping data, you’ll want to know what your collected data looks like. Depending on your skill level, or access to enterprise tools, you have many options in which to start plotting data for exploration. For more advanced programmers, there are many libraries for Python and R users that can read different data formats, handle large datasets, and may allow you to deploy the application for end user interaction
- Data modeling & prediction: Now that you have a sense of some trends in your data, you can model relationships between geographic locations to predict your target variable of choice. This is where spatial analysis shines: using both geographic and non-geographic features enables multi-dimensional analysis for better results.
Has this piqued your interest in learning more about analytics roles and the field of data analytics in general? Why not try out this free, self-paced data analytics course? You may also be interested in the following articles: