Downloading datasets from Our World in Data in R

Posted on January 30, 2025 by kjytay in R bloggers | 0 Comments

[This article was first published on R – Statistical Odds & Ends, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I recently learned from Allen Downey’s blog that Our World in Data is providing API access to their data. Our World in Data hosts datasets across several important topics, from population and demographic change, poverty and economic development, to human rights and democracy. From Nov 2024, Our World in Data “[offers] direct URLs to access data in CSV format and comprehensive metadata in JSON format” (this is what they call the Public Chart API).

See this link for full documentation on the chart data API. Allen Downey’s blog post shows how to use this API in Python; in this post I’ll show the corresponding code in R.

Downloading metadata

The chart data API page tells us what APIs are available to us. The starting point is a base URL which we denote by <base_url>. This is what’s available to us:

<base_url> – The page on the website where you can see the chart.
<base_url>.csv – The data for this chart, usually what we want to download.
<base_url>.metadata.json – The metadata for this chart, e.g. chart title, the units, how to cite the data sources.
<base_url>.zip – The dataset, metadata, and a readme as zip file.

Let’s start by looking at the metadata for average monthly surface temperature:

library(tidyverse)
library(httr)
library(jsonlite)

# define URL and query parameters
url <- "https://ourworldindata.org/grapher/average-monthly-surface-temperature.metadata.json"
query_params <- list(
  v = "1",
  csvType = "full",
  useColumnShortNames = "true"
)

# get metadata
headers <- add_headers(`User-Agent` = "Our World In Data data fetch/1.0")
response <- GET(url, query = query_params, headers)
metadata <- fromJSON(content(response, as = "text", encoding = "UTF-8"))

The returned metadata object is a list with a few keys:

# view the metadata keys
names(metadata)

#> [1] "chart"          "columns"        "dateDownloaded" "activeFilters"

The chart element gives us high-level information of the chart:

glimpse(metadata$chart)

#> List of 5
#>  $ title           : chr "Average monthly surface temperature"
#>  $ subtitle        : chr "The temperature of the air measured 2 meters above the ground, encompassing land, sea, and in-land water surfaces."
#>  $ citation        : chr "Contains modified Copernicus Climate Change Service information (2025)"
#>  $ originalChartUrl: chr "https://ourworldindata.org/grapher/average-monthly-surface-temperature?v=1&csvType=full&useColumnShortNames=true"
#>  $ selection       : chr "World"

There’s more information on the dataset in metadata$columns$temperature_2m

names(metadata$columns$temperature_2m)
#>  [1] "titleShort"            "titleLong"             "descriptionShort"      "descriptionProcessing" "shortUnit"             "unit"                 
#>  [7] "timespan"              "type"                  "owidVariableId"        "shortName"             "lastUpdated"           "citationShort"        
#> [13] "citationLong"          "fullMetadata"

(Why temperature_2m? Looking at metadata$columns$temperature_2m$shortName, it seems like temperature_2m is the short name for the dataset.)

Downloading the dataset

Here is code downloading the average monthly surface temperature for just the USA:

# define the URL and query parameters
url <- "https://ourworldindata.org/grapher/average-monthly-surface-temperature.csv"
query_params <- list(
  v = "1",
  csvType = "filtered",
  useColumnShortNames = "true",
  tab = "chart",
  country = "USA"
)

# get data
headers <- add_headers(`User-Agent` = "Our World In Data data fetch/1.0")
response <- GET(url, query = query_params, headers)
df <- read_csv(content(response, as = "text", encoding = "UTF-8"))

head(df)
#> # A tibble: 6 × 6
#>   Entity        Code   year Day        temperature_2m...5 temperature_2m...6
#>   <chr>         <chr> <dbl> <date>                  <dbl>              <dbl>
#> 1 United States USA    1941 1941-12-15            -1.88                 8.02
#> 2 United States USA    1942 1942-01-15            -4.78                 7.85
#> 3 United States USA    1942 1942-02-15            -3.87                 7.85
#> 4 United States USA    1942 1942-03-15             0.0978               7.85
#> 5 United States USA    1942 1942-04-15             7.54                 7.85
#> 6 United States USA    1942 1942-05-15            12.1                  7.85

Allen Downey notes that the last column is undocumented, and based on context is probably the average annual temperature.

Here’s a simple plot of the data:

plot(df$Day, df$temperature_2m...5, type = "l",
     main = "USA ave monthly temperature",
     xlab = "Date", ylab = "Temperature in C")

If you wanted the entire dataset, just use csvType = "full" in the request parameters, like so:

# define the URL and query parameters
url <- "https://ourworldindata.org/grapher/average-monthly-surface-temperature.csv"
query_params <- list(
  v = "1",
  csvType = "full",
  useColumnShortNames = "true",
  tab = "chart"
)

# same code for the http request works, see above

How do you know which filters you can apply? There is no extensive documentation of what filters you can apply, but as you click on the base URL to change the chart, the URL changes. You can then use this to infer the query parameters to add.

For example, I clicked around on the chart to get this URL: https://ourworldindata.org/grapher/average-monthly-surface-temperature?tab=chart&time=2001-05-15..2024-05-15&country=OWID_WRL~THA. The code below shows how you can download the data and reproduce the chart:

# define the URL and query parameters
url <- "https://ourworldindata.org/grapher/average-monthly-surface-temperature.csv"
query_params <- list(
  v = "1",
  csvType = "filtered",
  useColumnShortNames = "true",
  tab = "chart",
  time = "2001-05-15..2024-05-15",
  country = "OWID_WRL~THA"
)

# get data
headers <- add_headers(`User-Agent` = "Our World In Data data fetch/1.0")
response <- GET(url, query = query_params, headers)
df <- read_csv(content(response, as = "text", encoding = "UTF-8"))
ggplot(df) +
  geom_line(aes(x = Day, y = temperature_2m...5, col = Entity)) +
  labs(title = "Ave monthly temperature for World & Thailand",
       x = "Date", y = "Temperature in C") +
  theme_bw() +
  theme(legend.position = "bottom")

To leave a comment for the author, please follow the link and comment on their blog: R – Statistical Odds & Ends.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Downloading datasets from Our World in Data in R

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)