Use Google Sheets to pull data from Wikipedia into Google Data Studio
So in this post I’ll show you how to use Google Sheets to pull data from Wikipedia to use in Google Data Studio.
The dashboard showing the Top 30 Most-Viewed YouTube videos.
Before we start… go follow Ben Collins on Twitter!
I wanted to say thank you to Ben Collins, perhaps the most prominent expert on Google Sheets.
Follow him on Twitter at @benlcollins or visit his website.
The Wikipedia article with the data
So the article I’ll be using is about the most viewed videos on YouTube. Below is a screenshot of the table from the article.
The first step is to get this data into a Google Sheet.
The Formulas to use to pull data from a website table to a Google Sheet
So for this example we’re going to use the following formula. The formula is from this Ben Collins article.
=ImportHtml(“https://website_url.com”, “table”, 1)
Other formulas that may work for you.
Further reading on scarping using Google Sheets
- Using Google Sheets as a basic web scraper
- ImportXML function
- Simple Web Scraping using Google Sheets (2020 updated)
Using the Import HTML function to pull data into a Google Sheet
We are going to use the following formula to pull data into our sheet.
=ImportHtml(“https://en.wikipedia.org/wiki/List_of_most-viewed_YouTube_videos”, “table”, 1)
I put this formula into cell A1 in your spreadsheet. It should automatically pull the data.
One thing you will notice is that there are some irritating footnote links in the song title cells. For example we have “Despacito”[23].
To get rid of these, I created a new column in the spreadsheet using this formula.
=SPLIT(B2,”[“)
This separates the name of the song from the footnote number.
Building the dashboard in Google Data Studio
So the table setup is pretty simple. Below is what it looks like.
Here is how it’s setup in Google Data Studio.
How to make sure the Data in Data Studio is up to date
So how do we make sure that the data that is being displayed in Data Studio is up to date?
If we look at the extracted table there is a cell that shows at what date the data was accurate.
So what we do is create a new column with just one cell in it.
We can call it ‘Latest Date’ and it will just have the latest date that the data was updated in Wikipedia.
We can then have a table with a single value in Google Data Studio as shown below.
I hope you found this article about how to use Google Sheets to pull data from Wikipedia interesting and useful.
Download Google Data Studio Templates
If you’d like to download one of my Google Data Studio reports as a template to use for your own data, you can visit Data Studio Templates and purchase one.
My Other Blog Posts
If you liked this post you might enjoy my other blog posts.
Hi Michael, I’m digging your blog posts. My friend and I do a lot of Google Sheets/Data Studio reporting as well and I like to see other people’s approaches and solutions. I’m gonna use some of the tips you mentioned here. Keep up the great work. PS I like your artistic blog cover photos!
Thanks so much! Appreciate the compliments! let me kow if theres’s any type of content you”d be interested in seeing?