Use Google Sheets to pull data from Wikipedia into Google Data Studio

Use Google Sheets to pull data from Wikipedia to Google Data Studio

So in this post I’ll show you how to use Google Sheets to pull data from Wikipedia to use in Google Data Studio.

The dashboard showing the Top 30 Most-Viewed YouTube videos.

Before we start… go follow Ben Collins on Twitter!

I wanted to say thank you to Ben Collins, perhaps the most prominent expert on Google Sheets.

Follow him on Twitter at @benlcollins or visit his website.

The Wikipedia article with the data

So the article I’ll be using is about the most viewed videos on YouTube. Below is a screenshot of the table from the article.

The first step is to get this data into a Google Sheet.

The Formulas to use to pull data from a website table to a Google Sheet

So for this example we’re going to use the following formula. The formula is from this Ben Collins article.

=ImportHtml(“https://website_url.com”, “table”, 1)

Other formulas that may work for you.

Further reading on scarping using Google Sheets

Using the Import HTML function to pull data into a Google Sheet

We are going to use the following formula to pull data into our sheet.

=ImportHtml(“https://en.wikipedia.org/wiki/List_of_most-viewed_YouTube_videos”, “table”, 1)

I put this formula into cell A1 in your spreadsheet. It should automatically pull the data.


One thing you will notice is that there are some irritating footnote links in the song title cells. For example we have “Despacito”[23].

To get rid of these, I created a new column in the spreadsheet using this formula.
=SPLIT(B2,”[“)
This separates the name of the song from the footnote number.

created a new column in the spreadsheet using this formula.
=SPLIT(B2,"[")

Building the dashboard in Google Data Studio

So the table setup is pretty simple. Below is what it looks like.

Building the dashboard in Google Data Studio using Google Sheets to pull data from Wikipedia

Here is how it’s setup in Google Data Studio.

Here is how it's setup in Google Data Studio.

How to make sure the Data in Data Studio is up to date

So how do we make sure that the data that is being displayed in Data Studio is up to date?

If we look at the extracted table there is a cell that shows at what date the data was accurate.

If we look at the extracted table there is a cell that shows at what date the data was accurate.

So what we do is create a new column with just one cell in it.

create a new column with just one cell in it.

We can call it ‘Latest Date’ and it will just have the latest date that the data was updated in Wikipedia.

 data was updated in Wikipedia.

We can then have a table with a single value in Google Data Studio as shown below.

 a table with a single value in Google Data Studio

I hope you found this article about how to use Google Sheets to pull data from Wikipedia interesting and useful.

Download Google Data Studio Templates

If you’d like to download one of my Google Data Studio reports as a template to use for your own data, you can visit Data Studio Templates and purchase one.

My Other Blog Posts

If you liked this post you might enjoy my other blog posts.

Michael

Author Michael

More posts by Michael

Join the discussion 2 Comments

  • MM says:

    Hi Michael, I’m digging your blog posts. My friend and I do a lot of Google Sheets/Data Studio reporting as well and I like to see other people’s approaches and solutions. I’m gonna use some of the tips you mentioned here. Keep up the great work. PS I like your artistic blog cover photos!

    • Michael says:

      Thanks so much! Appreciate the compliments! let me kow if theres’s any type of content you”d be interested in seeing?

Leave a Reply