In this post I’ll describe how you can use SQL queries in BigQuery to extract relevant data and transfer it to Google Sheets.
This could have value if you wanted to share the data with others, or wanted to connect this sheets data to Google Data Studio for whatever reason.
I’ll run through some examples of SQL queries that might be useful.
Using BigQuery Public Datasets
So the Dataset that I’ll use as an example is a table from the usa_names public dataset.
BigQuery has some very interesting public datasets which you can take a look at.
There are many different varieties that you can choose to access. The one we will be using today is the same one I used in a previous blog post. It is the USA Names dataset from the U.S. Social Security Administration.
Let’s take a look at this very interesting dataset in BigQuery.
Looking at the Schema of the dataset.
Before we start using queries, let’s take a moment to examine the Schema of the dataset. We see that we have five fields. Three are strings (state, gender, name) and two are integers (year and number).
We also get a description of each field which helps us to know what each one means.
Looking at the entire dataset
If we want to see the full number of entries in the dataset, we can use the following SQL query;
SELECT * FROM `bigquery-public-data.usa_names.usa_1910_current`;
This selects ALL the data in the dataset, and we see that it has returned a whopping 6 million rows.
So I’ll give you some examples of custom SQL queries you can use, and how you can use SQL to choose the data that is relevant to you.
Examples of custom SQL Queries you can run
I’ll run through some SQL Queries in BigQuery that you can adapt for your own datasets.
List of US States
Here are some custom SQL queries you can run to generate unique and smaller datasets.
SELECT DISTINCT state FROM `bigquery-public-data.usa_names.usa_1910_current` LIMIT 1000
Let’s look at adding some more information to this query.
List of US States with number of Names, and in descending order
Let’s say we want a list of states, the number of names in each state, and for the list to be in the order of highest to lowest number of names.
To do this we use this query;
SELECT DISTINCT state AS State, sum (number ) AS number_of_names FROM `bigquery-public-data.usa_names.usa_1910_current` GROUP BY state ORDER BY number_of_names desc;
We use a few new SQL functions for this query.
- sum(number) AS number_of_names – gives us the total number of names.
- GROUP BY state – gives us the total number of names by state.
- ORDER BY number_of_names desc – sorts the list from most names to least names.
List of Names with number of Names, and in descending order
We can use a formula similar to the one before. We just need to replace DISTINCT state with DISTINCT name, and to GROUP BY name, instead of state.
SELECT DISTINCT name AS Name, sum (number ) AS number_of_names FROM `bigquery-public-data.usa_names.usa_1910_current` GROUP BY name ORDER BY number_of_names desc;
We then have our list of names and the number of times they appear.
Top 10 Girls Names in 1957
Let’s say we want to see the ten most popular names for girls in a given year. Let’s say 1957.
So we know we have the Gender field which is described as “Sex (M=male or F=female)”. We also have the Year field which is a “4-digit year of birth”. We can add these parameters to our query by using the SQL WHERE Clause.
SELECT DISTINCT name AS Name, sum (number ) AS number_of_names FROM `bigquery-public-data.usa_names.usa_1910_current` WHERE year=1957 AND gender="F" GROUP BY name ORDER BY number_of_names desc;
We can see that Mary, Susan and Linda were the top three names for girls in 1957.
Sending custom BigQuery Data to Google Sheets
So let’s say we want to do some data analysis on some of our BigQuery data in Google Sheets.
Let’s also say that we are only interested in the top 100 names from Kentucky from 1987 until 1994.
SELECT DISTINCT name AS Name, sum (number ) AS number_of_names FROM `bigquery-public-data.usa_names.usa_1910_current` WHERE year>1987 AND year<1994 AND state="KY" GROUP BY name ORDER BY number_of_names desc LIMIT 100;
And we have our results.
Click on SAVE RESULTS and then choose the option to save as a Google Sheet.
You’ll then see a notification in the bottom left corner of your screen confirming that the data has been saved to a Google Sheet.
You can click on ‘Open’ to view the Google Sheet.
We have our BigQuery data in our Sheet, and can analysis and manipulate it in any way we want. For example, we could create a vaguely psychedelic pie chart as shown below.
I hope you enjoyed reading this post about SQL Queries in BigQuery and learned something new!
My Other Blog Posts
I hope you enjoyed this article. If you did, you might enjoy some of my previous blog posts!