Create, read, update, and delete—or CRUD—stand for the four fundamental operations in computer programming. They help structure storage processes and management in basic computer applications.
As CRUD is a foundational concept in programming, it’s essential to gain a thorough understanding of what they are, and how to perform each operation appropriately in data analytics, where they serve as the main way in which you interact with databases—if that’s the field you’re looking to get into.
While this sounds important, it might not be clear how CRUD operations are used in everyday analytics workflows. What do CRUD operations look like? How can we apply the four operations to the analytics process?
In this beginners guide, we’ll take you through the basics of CRUD, including:
- What are CRUD operations?
- CRUD operations: Create
- CRUD operations: Read
- CRUD operations: Update
- CRUD operations: Delete
- Applications of CRUD operations
- Next steps
Ready to learn more about CRUD operations? Let’s dive in!
1. What are CRUD operations?
Before we can understand the importance of CRUD operations, we need to first take a step back at why it’s necessary in the first place and how this relates to data storage.
Most companies need to store their data in a way that can be accessible at a later time and when access to power has been turned off. This is known as persistent storage, or simply storing generated data or documents to a saved file. Without persistent storage, users cannot retrieve their data for later analysis. Hence, when data is created, it needs to be stored somewhere. This typically looks like a hard drive.
Once we have a place to store data, we will need to keep it organized for easier retrieval and capacity management. Enter the data analyst’s favorite helpers: the relational database and its tools for reading and transforming the data. At its simplest, a database consists of tables with rows and columns. Depending on the tools you feel more comfortable with, you can either use a GUI or programming language to execute CRUD operations on the database.
After the data has been stored, you might want to change it to update records with new data, or delete records that have been deemed no longer necessary. As analysts, our day-to-day work consists of cleaning data, transforming variables to create new columns, performing aggregations across tables, and inputting missing records. If we have new data being generated every hour, we might want to create a data pipeline to make sure our tables continuously update to reflect these new additions. None of these actions would be possible without CRUD operations.
This brings us back to what CRUD is all about. It helps us make use of the power of persistent storage and relational databases. At a high level, we can see how CRUD operations inform and carry out database management and design. They also make it easy for database engineers and analysts to work with databases, ensure appropriate security controls, and can have higher performance efficiency than one of our favorite database languages, SQL. If you haven’t yet encountered SQL, do check out our complete beginner’s guide that shows you how to write SQL queries, and you can reference our handy cheat sheet to the eight most important SQL commands you’ll need.
Next, we’ll look at the individual elements of CRUD in detail: create, read, update, delete.
2. CRUD operations: Create
The first step of CRUD operations is Create, which does exactly what it implies: it creates an entry. Adding new rows to a table can be done with the Create command.
As with any programming method, you have multiple options of doing so. In SQL, you can do the same thing with the Insert command.
The Insert Into command lets you add values for specific columns:
INSERT INTO housing_table (price, city, num_bedrooms, type)
VALUES ($500000, “Toronto”, 4, “detached”)
You can add values directly without referencing the specific column. This is useful when you are adding values for all available columns:
INSERT INTO housing_table
VALUES ($400000, “New York”, 1, “studio”)
You can also add data from another table to your table through a more complex SQL statement. This is a recommended way to import a large dataset from one table to another.
INSERT INTO housing_table [(price, city, num_bedrooms, type)]
SELECT price, city, num_bedrooms, type
The important thing to keep in mind here is that users cannot create new columns, only new rows. To add new columns, you might need to request special permission from the database administrator.
3. CRUD operations: Read
Retrieving the data you need can be done with the read function, which refers to the Select command.
This is likely the first way that most of us encounter SQL queries. Select is the way by which we retrieve the records we need in a table’s rows and columns.
A simple Select statement to retrieve the full table looks like this (the asterisk is a short form way of referring to all the rows and columns in a table):
We’ll more likely want to look at a filtered version of the table by a given criterion or set of criteria. In this next statement, we only want to look at a table of two columns, ordered by city name, and limiting the number of retrieved rows to 10.
SELECT price, city
ORDER BY city
There are many further clauses available for use under the Select statement, which allow you to add window expressions, filtering rows based on a value or condition, and performing group by aggregations.
4. CRUD operations: Update
We can use the Update command to edit existing data quickly. In this next example, we’ll update a record at id number 249 with the following new data points:
SET city = “Shanghai”, type = “condo”, price = “350000”
WHERE id = 249
Hence, you need to specify the columns to be updated together with the new values. It’s suggested that you limit the number of rows retrieved as otherwise this might create concurrency issues, which refers to creating conflicting versions of your dataset in the process of trying to update and access data.
5. CRUD operations: Delete
Finally, we turn to the delete function and command. This allows you to remove records given specified conditions. A simple statement removing records that belong to a condition, entries belong to the city of Tokyo, looks like this:
DELETE FROM housing_table
WHERE city = “Tokyo”
You can also remove entire columns:
ALTER TABLE housing_table
DROP COLUMN type
If you no longer need the table itself, you can delete it entirely, too:
DELETE FROM housing_table
Unsurprisingly, the delete function can lead to catastrophic outcomes if tables or important columns were accidentally deleted due to a coding or communication error. This is where extra caution must be taken to ensure that the action being taken has been confirmed with multiple parties to be correct before proceeding.
It’s also helpful to verify if users have access to a hard or soft delete: hard deletes are a permanent removal of records, while a soft delete only involves updating the row without changing the underlying data.
6. Applications of CRUD operations
It’s easy to imagine how CRUD operations work in practice when virtually most companies in today’s business environment deal with data in some shape or form. We’ll take a look at one example of how you might interact or implement CRUD operations in a business environment.
With a better understanding of practical applications, you could even start to assess new ways of better integrating them into existing data analytics workflows or propose projects that can yield new insights.
Airlines not only deal with massive amounts of data, which can include flight prices, schedules, ticket sales, staffing levels, and travel locations, they also need to ensure that the available data is up to date given how flight prices and schedules can fluctuate each second due to market demand and scheduling changes.
We can imagine that the internal IT team manages hundreds or even thousands of relational databases connecting these data to each other, for access by many internal teams including sales, marketing, customer services, and human resources:
- A flight prices table that contains departure and arrival location, dynamic pricing determined by an algorithm, number of available seats, number of seats sold, historical sales figures for the flight route, plane ID, and more.
- A schedule table that contains information about the departure and arrival airports, time of departure, time of arrival, estimated travel time, historical cancellations, flight ID, and staffing levels, number of seats sold.
- A staffing table that includes employee names, internal staffing identification number, home country, citizenship, contract status, length of employment, seniority status, and upcoming scheduled shifts.
If the airline decides to expand to offer flights to Southeast Asia, they can create new records to update the flight prices and schedule tables to reflect the new locations. When existing air stewards or stewardesses resign or retire, the human resources department will delete their records from the database.
If the business analytics team wants to do a study of unprofitable routes, they can read the flight prices table to retrieve historical sales figures for flights that have continuously undersold their capacity. Finally, the data team can create scripts that update scheduling changes in the schedules table once new information arrives about delays or early arrivals.
7. Next steps
We hope that this article has demystified CRUD operations. Together with databases, they allow us to perform powerful analytical workflows. The four operations form the backbone of modern analytical workflows.
Without CRUD operations, we wouldn’t be able to organize our data, retrieve the subsets that we need, transform existing records, or remove outdated and unimportant records.
Let’s quickly review the features that make up CRUD operations as they just might come in handy in your next analytics project:
- Create: The Insert Into command lets you add new rows to a table, and is a best practice for transferring large datasets from one table to another.
- Read: The Select function enables you to retrieve data, whether it’s the entire table or a table filtered by certain conditions.
- Update: Update allows you to change existing data with newer values.
- Delete: When columns, records or entire tables are no longer needed, perform the Delete command to do a soft or hard deletion.
You may also be interested in learning about CASE statements, which are often used when running CRUD operations.
Has this piqued your interest in learning more about analytics roles and the field of data analytics in general? Why not try out this free, self-paced data analytics course? You may also be interested in the following articles: