How to analyse sales data using Google BigQuery and Data Studio
In this post I’ll show you how to analyse sales data using Google BigQuery and Google Data Studio.
According to Google Cloud; “BigQuery is a serverless, highly-scalable, and cost-effective cloud data warehouse with an in-memory BI Engine and machine learning built in.”
For the purposes of today’s blog post, I’ll be showing how it can help you handle large amounts of data sitting in Google Sheets that Google Data Studio would not be able to handle while connected directly to the sheet.
Below is the example data-set for this blog post. It shows the sales from a hypothetical business in South Africa selling various food products.
Below is example of what the Data Studio report will look like, and how fast it can be filtered despite the large number of rows (over 20,000) in the data source.
In this Google Sheets data source the columns are;
– Purchase ID
– Purchase Date
– Item purchased
– Quantity of items purchased
– City where order occurred
– Country where order occurred
It has over twenty thousand rows, and would be painfully slow in Data Studio if we connected it straight from Google Sheets.
In order to have a fast Data Studio report that we can filter and manipulate quickly, we need to get this data into BigQuery.
Download Example data
Below is the data as an Excel spreadsheet, if you would like to replicate this report.
Once you’ve downloaded this file, you can add it to a Google Sheet and replicate the steps I’ve taken in this blog post.
1. Create a Google BigQuery Account
The first step is to create a BigQuery account. Take a look at the links below to help you get started.
Once you’ve created an account we can star
2. Create a BigQuery project
The first thing to do is to create a Project within BigQuery.
For this example the hierarchy is Project > Dataset > Table.
Once you’ve selected which project you want to use we can move on to create the dataset.
3. Create a BigQuery dataset
After you’ve created a BigQuery account, the next step is to create a Dataset.
In your new BigQuery project, you will see that it says “No data sets available”.
You then need to click “CREATE DATASET”.
You need to give your dataset an ID. Here I have called my “Sales_data_example”. You also have the option to choose where your data will be located. I’ve kept this as ‘Default’.
Once you’ve named your dataset and made any adjustments to suit your own situation, click to create the dataset.
We can now see under our test project we have our “Sales_data_example” dataset. The next step is to create a table within this dataset.
You’ll see the option to create a table on the top right in the image below. Click ‘CREATE TABLE’ to get started.
4. Create a Table within your BigQuery data-set
The first step is to click ‘CREATE TABLE’.
Below is what you see after you’ve clicked to create a table. We now need to fill in the details.
The first step is to choose where we will be getting our data from. As mentioned earlier, we have a large Google Sheet that has over twenty thousand rows of data. We want to connect to this sheet.
So therefore we choose to create a table from ‘Drive’, as shown below.
In order to do this we need the link to the specific Google Sheet. To get this, open up Google Drive, find the spreadsheet and click ‘Get shareable link’. This gives you a link to the specific spreadsheet you need to use.
Once we’ve clicked ‘Get shareable link’ we will see the display below that says ‘Link sharing on’.
We can now create our table, as shown below.
Create table from: Drive
Select Drive URI: our previously copied link
File format: Google Sheets
You also need to name your table. I’ve named it ‘Sales_data_sheets’.
We also need to name our Schema, AKA our columns. Instead of naming them here, you can auto-detect schema and input parameters and name your columns in Data Studio. But this method is more straightforward for the purposes of this example.
Based on the columns in my Google Sheet, I’ve labelled the Schema as;
- ID (Integer)
- Date (Date)
- Item (String)
- Quantity (Integer)
- City (String)
- Country (String)
Once we’ve labelled our Schema we can click to create the table within our dataset.
So below we see our newly create table within our project and our dataset.
Project = test
Dataset = Sales_data_example
Table = Sales_data_sheets
We have table setup and we can see the Schema matches what we added during the table creation.
The next step is to query the table.
5. Query your BigQuery Table
Click Query Table to start the process.
We then get a SQL query that appears. You will likely need to add in an asterisk * so your query reads;
SELECT * FROM ‘project_name.dataset_name.table_name’ LIMIT (whatever includes all of your rows).
SELECT * simply means to select all / everything from the table.
The next step is to click “Run” below the query. This will query the database. It will likely take around a minute to complete the query.
Below we see the end result of a completed query under ‘Query results’. We can see that our Schema labels accurately reflect the content of the columns. We see that we have 21825 rows in our query results.
The next step is to click “SAVE RESULTS” from our Query results.
We have various options when saving the file. What we want is to save the results as a “BigQuery table“.
I’ve chosen to save it as a BigQuery table and I’ve named it “BigQuery_Sales_data”.
So within our Test Project we have our Dataset Sales_data_example, and within that two tables; Sales_data_sheets (which is connected to our Google Sheet) and BigQuery_Sales_data (which is based on our Query results).
6. Add the saved BigQuery results to your Data Studio report
We want to connect our Google Data Studio report to our Query results table, and not the table based on our Google Sheet.
So in this example we want to connect Data Studio to BigQuery_Sales_data and NOT Sales_data_sheets. We need to connect to the Query results to get the speed we need to analyze this amount of data.
Create a new Data Studio report and choose BigQuery as your connector.
Find the correct table. In this case it’s “BigQuery_Sales_data”. We can see the hierarchy of Project > Dataset > Table on display when adding data to the report.
We can then see the fields within our Table. Data Studio has correctly guessed what Type of data they will be. It has also added a Record Count metric.
We can now add our BigQuery data to the report and start playing around with the data! Click Add to Report to get started.
7. Create your Data Studio report and analyse data
We now have our data in our report. Now we can filter and manipulate the data and the speed is much faster than what would have been possible by using Google Sheets. Even though we have over twenty thousand rows, this is a small dataset compared what is possible with Big Query!
Other Blog Posts
I hope you found this post useful. If you enjoyed it you may enjoy my previous blog posts!
You can also follow me on Twitter!