
{"id":3767,"date":"2020-11-10T09:00:00","date_gmt":"2020-11-10T08:00:00","guid":{"rendered":"https:\/\/careerfoundry.inbearbeitung.de\/en\/blog\/uncategorized\/sql-cheat-sheet\/"},"modified":"2022-12-07T15:58:58","modified_gmt":"2022-12-07T14:58:58","slug":"sql-cheat-sheet","status":"publish","type":"post","link":"https:\/\/careerfoundry.inbearbeitung.de\/en\/blog\/data-analytics\/sql-cheat-sheet\/","title":{"rendered":"SQL Cheatsheet: Learn Your First 8 Commands"},"content":{"rendered":"<p><strong>Imagine being able to answer any data question within a matter of minutes. Imagine not having to download and import data into spreadsheets before you even begin to analyze your dataset. Imagine typing SQL queries off the top of your head.<\/strong><\/p>\n<p>I\u2019m speaking from experience when I tell you that this is completely possible. Sometimes, though, it\u2019s difficult to know where to start. I\u2019ve written this post to solve that problem for you.<\/p>\n<p>I want this post to be the first step in your journey to becoming a data-savvy professional, equipping you with skills which are guaranteed to elevate your career and your business. In the next 15-30 minutes you\u2019ll write some really useful queries and learn the foundations of data analysis with SQL.<\/p>\n<p>But before we dive into it, let me briefly introduce myself. My name\u2019s Anatoli, and, by night, I\u2019m the founder and author of <a href=\"https:\/\/www.sqlhabit.com\/\" rel=\"noopener\">SQL Habit<\/a>\u2014an online course which teaches SQL via business simulation, complete with real life product and marketing data. By day, you can find me working as the Lead Engineer\/Data Analyst in the Growth Department at <a href=\"http:\/\/blinkist.com\/\" rel=\"noopener\">Blinkist<\/a>, the number one book summary app in the world, with 16 million users.<\/p>\n<p>Ready to start learning SQL? Here\u2019s what we\u2019ll cover:<\/p>\n<ol>\n<li><a href=\"#why-use-sql\">Setting the scene: Why use SQL?<\/a><\/li>\n<li><a href=\"#glossary-of-sql-terms\">Glossary of SQL terms<\/a><\/li>\n<li><a href=\"#start-working-with-real-data\">Getting started: Setting up your SQL editor<\/a><\/li>\n<li><a href=\"#how-to-write-queries-in-sql\">How to write queries in SQL<\/a><\/li>\n<li><a href=\"#foundation-select-and-from-commands\">Foundation: SELECT and FROM commands<\/a><\/li>\n<li><a href=\"#sorting-records-with-order-by-command\">Sorting records with ORDER BY command<\/a><\/li>\n<li><a href=\"#show-specific-number-of-records-with-limit-command\">Show specific number of records with LIMIT command<\/a><\/li>\n<li><a href=\"#your-first-sql-function-count\">Your first SQL function: COUNT()<\/a><\/li>\n<li><a href=\"#avg-function\">AVG() function<\/a><\/li>\n<li><a href=\"#using-aliases-with-as-command\">Using aliases with AS command<\/a><\/li>\n<li><a href=\"#building-your-first-report-with-group-by-command\">Building your first report with GROUP BY command<\/a><\/li>\n<li><a href=\"#next-steps\">Next steps<\/a><\/li>\n<\/ol>\n<h2 id=\"why-use-sql\">1. Why use SQL?<\/h2>\n<p>My advice to anyone who\u2019s starting out with SQL is simple: Use it! Whether at work or on one of your own personal projects, it\u2019s important to try to answer some real questions. It\u2019s just like learning any language\u2014the earlier you start using it, the sooner you\u2019ll be able to use it fluently.<\/p>\n<p>This is how I learned SQL. I work in a tech startup with a mobile app which is used by more than 15 million people worldwide. From our data, I can easily and quickly answer questions like:<\/p>\n<ul>\n<li>How many new users signed up lately?<\/li>\n<li>How effective are our marketing campaigns?<\/li>\n<li>What are the most useful features of our app?<\/li>\n<li>Where do we lose users?<\/li>\n<li>And I can slice the above data by month, per country, by marketing campaign, pretty much anything!<\/li>\n<\/ul>\n<p>So, in summary, you can answer any question (!!) by simply typing an SQL query. Sounds like a super power, right?<\/p>\n<p>Before we get to our first query, let\u2019s learn the basic vocabulary of SQL.<\/p>\n<h2 id=\"glossary-of-sql-terms\">2. Glossary of SQL terms<\/h2>\n<h3 id=\"database\">Database<\/h3>\n<p>An SQL database is a physical server where you data is stored. It could be hosted on your computer or on a remote server.<\/p>\n<h3 id=\"tables-and-columns\">Tables and columns<\/h3>\n<p>An SQL database consists of tables. Tables have names like <em>users<\/em> or <em>purchases<\/em>. Similar to spreadsheets, they have multiple rows and columns. Each column has a predefined type (you define it when creating a table) like number, date, or text.<\/p>\n<h3 id=\"query\">Query<\/h3>\n<p>Any data analysis is a dialogue between you and the database. You ask questions in the form of SQL queries that are sent to the SQL database server. For example, here\u2019s a query that counts all users:<\/p>\n<p>{% highlight sql linenos %}<br \/>\nSELECT COUNT(*)<br \/>\nFROM users<br \/>\n{% endhighlight %}<\/p>\n<h3 id=\"sql-client\">SQL client<\/h3>\n<p>To connect to an SQL database you need an SQL client. It usually looks like a text editor where you simply type queries.<\/p>\n<h3 id=\"result-set\">Result set<\/h3>\n<p>Result set is the data returned by your query.<\/p>\n<p>Result set is a table that was generated by the SQL server using your query. Even when a result set is a single number, it\u2019s still a table with one column and one row.<\/p>\n<h2 id=\"start-working-with-real-data\">3. Start working with real data<\/h2>\n<p>In a minute, you\u2019ll write your first query. Let\u2019s make sure you have an SQL editor at hand. Feel free to skim further, but I highly recommend you actually type the following queries to get the maximum value from this tutorial.<\/p>\n<p><strong>Option 1 (the simplest).<\/strong> Sign up for SQL Habit (it\u2019s free) and go to the <a href=\"https:\/\/www.sqlhabit.com\/playground\" rel=\"noopener\">Playground<\/a>\u2014it\u2019s a fully functional SQL client working in your browser, no setup needed.<\/p>\n<p><strong>Option 2.<\/strong> Ask a System Administrator at work to give you a read-only access to your company\u2019s database (or say \u201cdata warehouse\u201d to get extra points). I\u2019m pretty sure you\u2019ll get a recommendation for choosing an SQL client as well (it\u2019s likely that your company already has a license for a great client).<\/p>\n<h2 id=\"how-to-write-queries-in-sql\">4. How to write queries in SQL<\/h2>\n<p>You\u2019re all set up. Let\u2019s start writing SQL queries!<\/p>\n<h3 id=\"dataset\">Dataset<\/h3>\n<p>In this tutorial you\u2019ll be working with a single table that contains user data for a mobile app. If you\u2019re using SQL Habit\u2019s client, the table you need is called <em>users<\/em>.<\/p>\n<p>To do any type of data analysis, it\u2019s really important to understand how the data made it to this table. In our case, think about this scenario: every time a user signs up within a mobile app, a new record is created in the <em>users<\/em> table.<\/p>\n<p>Here\u2019s a data sample of the <em>users<\/em> table:<\/p>\n<p><img decoding=\"async\" title=\"A data sample showing users in a database\" src=\"\/en\/wp-content\/uploads\/old-blog-uploads\/data-sample.png\" alt=\"A data sample showing users in a database\" \/><\/p>\n<p>As you can see, an SQL table is very similar to a spreadsheet: it has rows and columns (note that each column has a unique name) and, just by looking at a table, you can infer the column types: <em>id<\/em> columns is clearly a number and email is a <em>text<\/em> one.<\/p>\n<h2 id=\"foundation-select-and-from-commands\">5. Foundation: SELECT and FROM commands<\/h2>\n<p>The whole challenge of working with SQL is to translate our business question into an SQL query.<\/p>\n<p>The most basic question we can ask our dataset is \u201cWhat data do we have?\u201d That may sound simple, but don\u2019t be misled\u2014there\u2019s a ton of things you can learn from this question.<\/p>\n<p>This question translates into the following query:<\/p>\n<p>{% highlight sql linenos %}<br \/>\nSELECT *<br \/>\nFROM users<br \/>\n{% endhighlight %}<\/p>\n<p>You used 4 commands or keywords here: <em>SELECT<\/em>, <em>*<\/em>, <em>FROM<\/em> and <em>users<\/em>.<\/p>\n<p><em>FROM<\/em> tells our SQL server which table you\u2019re going to use. <em>FROM users<\/em> can be interpreted as \u201cWe\u2019ll be working with the <em>users<\/em> table\u201d. <em>SELECT<\/em> command tells an SQL server which columns you want to have in your result set. The * is a wildcard for \u201call columns\u201d. Alternatively, we can select specific columns:<\/p>\n<p>{% highlight sql linenos %}<br \/>\nSELECT email, country<br \/>\nFROM users<br \/>\n{% endhighlight %}<\/p>\n<p>Let\u2019s do a quick recap. Our initial request of <em>Show me all the data<\/em> (or, more accurately; <em>Show me all the data in the table<\/em>) in SQL is essentially; <em>Select all columns from a specific table<\/em>:<\/p>\n<p>{% highlight sql linenos %}<br \/>\nSELECT *<br \/>\nFROM users<br \/>\n{% endhighlight %}<\/p>\n<p>These are the basic components of every single query you\u2019ll write in the future: SELECT and FROM.<\/p>\n<p>It might seem like this query is so simple and basic that we can\u2019t really get any value out of it, but that\u2019s not true at all. Of course, more complex questions require more SQL knowledge, but you\u2019ll soon see that there\u2019s a finite number of patterns required to perform any type of data analysis.<\/p>\n<p>SELECT * FROM is a pattern I always use in my analysis. I start any analysis by running this query just to get a glimpse of the data I\u2019ll be working with. You can see what columns are in the table, what data types there are, and whether some columns always have data or not. You can see patterns in data\u2014is there a limited amount of values or are all values unique? You can get all these answers by running this simple query and observing the output. Now you\u2019ve added your first query to your data toolbox, let\u2019s move onto the next one.<\/p>\n<h2 id=\"sorting-records-with-order-by-command\">6. Sorting records with ORDER BY command<\/h2>\n<p>You may have noticed that the result set of our very first query is ordered in a strange way. Let\u2019s level up our previous request to the following: <em>Show me the most recent users data<\/em>.<\/p>\n<p>When analyzing your data set, you\u2019ll often want to order it by the time of creation or modification. Why? Well, the way data is stored and processed could change over time (you add new data like more signup platforms, new columns, etc) and it\u2019s important to see the latest state.<\/p>\n<p>To order data in the result set, we need to use the ORDER BY statement:<\/p>\n<p>{% highlight sql linenos %}<br \/>\nSELECT *<br \/>\nFROM users<br \/>\nORDER BY signup_date DESC<br \/>\nSELECT *<br \/>\nFROM users<br \/>\nORDER BY signup_date ASC<br \/>\n{% endhighlight %}<\/p>\n<p>You need to specify a column that\u2019ll be used for sorting, and the direction: DESC (<em>descending,<\/em> Z to A) or ASC (<em>ascending<\/em>, A to Z).<\/p>\n<h2 id=\"show-specific-number-of-records-with-limit-command\">7. Show specific number of records with LIMIT command<\/h2>\n<p>When you run <em>SELECT * FROM<\/em> querythe SQL server will send you all the data in a table. This can sometimes be problematic. Imagine your app has 1+ million users\u2014it might take a while and could cause unnecessary load to the database server to output all data for every single user.To avoid this, you simply need to limit the size of a result set. In plain English, our request to the database will be; <em>Show me the data of 100 most recent users<\/em>.<\/p>\n<p>In the end, we\u2019re running our <em>SELECT * FROM<\/em> queries to inspect the data visually\u2014we don\u2019t need millions of rows, right?<\/p>\n<p>The LIMIT keyword in a query always comes last. You just need to specify how many rows will be in the result set:<\/p>\n<p>{% highlight sql linenos %}<br \/>\nSELECT *<br \/>\nFROM users<br \/>\nORDER BY signup_date DESC<br \/>\nLIMIT 100<br \/>\n{% endhighlight %}<\/p>\n<h2 id=\"your-first-sql-function-count\">8. Your first SQL function: COUNT<\/h2>\n<p>Look at you! You already know the fundamentals of how to inspect any table in your company\u2019s database! As I mentioned, I always start any data analysis by simply inspecting the latest data in tables I\u2019ll be working with, using exactly those commands we\u2019ve just learned.<\/p>\n<p>Now it\u2019s time to move onto some slightly harder, slightly more practical questions. For example, let\u2019s ask; \u201cHow many users have signed up for our product so far?\u201d<\/p>\n<p>Such a question, when translated into SQL-speak, will read; <em>How many rows are there in the users table?<\/em><\/p>\n<p>So far we\u2019ve been printing table data <em>as it is<\/em>. Now we want something very different: we want to do some calculations based on the table\u2019s data. Specifically, we want to count its records.<\/p>\n<p>Such calculations exist in SQL as <strong>functions<\/strong>. You may think about them as functions from mathematics\u2014they take arguments and return values. But don\u2019t be put off by this terminology. The concepts behind it are straightforward.<\/p>\n<p>The function we\u2019ll need for our problem is COUNT(*). It\u2019s an example of an <strong>aggregate function in SQL<\/strong> because it \u201caggregates\u201d data. Instead of printing all records, it counts (aggregates) them and tells you how many rows match your query\u2019s criteria. In our users table, we have one row per user. Therefore, if we count how many rows we have in the table, we\u2019ll know exactly how many users our product has. Simple. Magic!<\/p>\n<p>{% highlight sql linenos %}<br \/>\nSELECT COUNT(*)<br \/>\nFROM users<br \/>\n{% endhighlight %}<\/p>\n<h2 id=\"avg-function\">9. AVG function<\/h2>\n<p>Let\u2019s take a look at another aggregate function that calculates an average value in a column: AVG().<\/p>\n<p>{% highlight sql linenos %}<br \/>\nSELECT AVG(age)<br \/>\nFROM users<br \/>\n{% endhighlight %}<\/p>\n<p>This query calculates an average of all our users\u2014in this case, the average age.<\/p>\n<p>You\u2019ll rarely need just one number from your data research. Most of the time you\u2019ll need to report multiple numbers, such as the average age <em>and<\/em> the user count. Since both these numbers rely on the same table, we can combine them in one query:<\/p>\n<p>{% highlight sql linenos %}<br \/>\nSELECT<br \/>\nCOUNT(*) AS users_count,<br \/>\nAVG(age) AS avg_age<br \/>\nFROM users<br \/>\n{% endhighlight %}<\/p>\n<h2 id=\"using-aliases-with-as-command\">10. Using aliases with AS command<\/h2>\n<p>You may have noticed a new SQL command in the previous query\u2014AS. AS allows you to name the columns of the result set. The thing is when you use functions in a query, SQL will name new columns in the result set as these functions (like <em>count<\/em> or <em>avg<\/em>).<\/p>\n<p>To keep things organized, I highly recommend that you always use aliases (AS) for new columns. Column names serve as living documentation for your query and will be useful for you (or your colleagues) in the future.<\/p>\n<h2 id=\"building-your-first-report-with-group-by-command\">11. Building your first report with GROUP BY command<\/h2>\n<p>I want to leave you with one more SQL command that will bring you one step closer to performing real data analyses. I\u2019m talking about GROUP BY, an SQL command that allows you to calculate aggregate values (using aggregate functions like COUNT or AVG) on segments of data.<\/p>\n<p>For example; \u201cWhat is the average age of our users in the USA?\u201d or \u201cWhich is the country with the highest number of users?\u201d. These questions sound like the real deal, right?<\/p>\n<p>Well, that\u2019s because they are. <em>GROUP BY is the main instrument of Data Analysis.<\/em> To get the signal from the noise, you\u2019ll constantly need to use different dimensions (country, age group, week\/month\/year of signup, AB-test variation).<\/p>\n<p>Here\u2019s an example of data analysis based on GROUP BY: One morning you find out that the overall signup rate for your product has dropped by 20%. It\u2019s really hard to take any action based on this fact, because there are no specifics. You\u2019ll proceed with calculating signup rates per country, per marketing campaign, per page, etc. to locate the problem. Perhaps, for example, the signup rate dropped to zero on one specific page due to a technical error, but it looked like a 20% drop overall.<\/p>\n<p>Moving on to the SQL syntax, GROUP BY requires you to select a column you\u2019ll use for aggregation, such as <em>country<\/em>:<\/p>\n<p>{% highlight sql linenos %}<br \/>\nSELECT<br \/>\ncountry,<br \/>\nCOUNT(*)<br \/>\nFROM users<br \/>\nGROUP BY country<br \/>\n{% endhighlight %}<\/p>\n<p>Note that the <em>country<\/em> column is specified after GROUP BY and in the SELECT statement. That way, a result set will have all unique values of the <em>country<\/em> column and their aggregates (number of users in each country or, in SQL, the number of rows in the users table with the same value of <em>country<\/em>).<\/p>\n<p>Let\u2019s use everything we\u2019ve learned so far to produce a final report:<\/p>\n<p>{% highlight sql linenos %}<br \/>\nSELECT<br \/>\ncountry,<br \/>\nCOUNT(*) AS users_count<br \/>\nFROM users<br \/>\nGROUP BY country<br \/>\nORDER BY users_count DESC<br \/>\nLIMIT 20<br \/>\n{% endhighlight %}<\/p>\n<p>The result set of this query contains the TOP 20 countries by the number of signups. And yes, that\u2019s your very first analytics report! Congratulations!<\/p>\n<h2 id=\"next-steps\">12. Next steps<\/h2>\n<p>Of course, data analysis with SQL is much, much bigger (and even more fascinating!) than described in this article. I hope it gave you a jump start into the world of learning SQL, and that you see how much there is to gain for your career or business.<\/p>\n<p>Here are some of the major next steps in learning SQL:<\/p>\n<ul>\n<li>Filtering (show example for each query)<\/li>\n<li>SQL joins<\/li>\n<li>Window functions<\/li>\n<li>Scalar functions<\/li>\n<li>Learning marketing\/product metrics<\/li>\n<li>Report types (histograms, timelines, etc)<\/li>\n<\/ul>\n<p>And of course, I highly recommend that you sign up for CareerFoundry\u2019s <a href=\"https:\/\/careerfoundry.inbearbeitung.de\/en\/short-courses\/become-a-data-analyst\/\">free introductory data analytics short course<\/a>. Or, if you want to focus on SQL, check out these <a href=\"https:\/\/careerfoundry.inbearbeitung.de\/en\/blog\/data-analytics\/sql-certifications\/\">SQL certifications<\/a>, or <a href=\"http:\/\/sqlhabit.com\/\" rel=\"noopener\">join me at SQL Habit<\/a>.<\/p>\n<p>Want to learn more handy data tools and techniques? Check out the following:<\/p>\n<ul>\n<li><a href=\"\/en\/blog\/data-analytics\/10-excel-formulas-every-data-analyst-should-know\/\">10 Excel formulas all data analysts should know<\/a><\/li>\n<li><a href=\"\/en\/blog\/data-analytics\/what-is-poisson-distribution\/\">What is Poisson distribution? An expert explains<\/a><\/li>\n<li><a href=\"\/en\/blog\/data-analytics\/what-is-random-forest\/\">What is random forest?<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>SQL is an essential skill for anyone working (or aspiring to work in) the data industry. Learn how to write SQL queries in this hands-on tutorial.<\/p>\n","protected":false},"author":50,"featured_media":330,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_lmt_disableupdate":"yes","_lmt_disable":"","footnotes":""},"categories":[3],"tags":[],"class_list":["post-3767","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-analytics"],"acf":{"homepage_category_featured":false},"modified_by":"Matthew Deery","_links":{"self":[{"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/posts\/3767","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/users\/50"}],"replies":[{"embeddable":true,"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/comments?post=3767"}],"version-history":[{"count":1,"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/posts\/3767\/revisions"}],"predecessor-version":[{"id":31953,"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/posts\/3767\/revisions\/31953"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/media\/330"}],"wp:attachment":[{"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/media?parent=3767"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/categories?post=3767"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/careerfoundry.inbearbeitung.de\/en\/wp-json\/wp\/v2\/tags?post=3767"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}