Historical Weather Data

[This article was first published on R - datawookie, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’m building a model which requires historical weather data from a selection of locations in South Africa. In this post I demonstrate the process of acquiring the data and doing some simple processing.

I need data for three locations: Brookes and Goje (in KwaZulu-Natal) and Hlangalane (in the Eastern Cape).

# A tibble: 3 × 4
  name       region          lat   lon
  <chr>      <chr>         <dbl> <dbl>
1 Brookes    KwaZulu-Natal -29.6  29.8
2 Goje       KwaZulu-Natal -28.3  31.2
3 Hlangalane Eastern Cape  -31.0  28.6

Here are those locations on a map. They are sufficiently far apart that we would expect them to have different weather histories.

Data Acquisition

I’m getting the data using Weather API. The business plan gives me access to data going back to the beginning of 2010. I like to mix things up, so I’ll hit the API from Python and then use R to do the processing.

The API key is stored in an environment variable.

import os

API_KEY = os.getenv("WEATHER_API_KEY")

Define the date range.

import pandas as pd

DATE_MIN = "2021-08-01"
DATE_MAX = "2022-08-01"

DATES = pd.date_range(start=DATE_MIN, end=DATE_MAX)

Create a function for retrieving the data and writing it to a file. There will be one JSON file per location and date.

import re
import requests

def weather_history(name, region):
    location = name+", "+region
    slug = re.sub("[, ]+", "-", location.lower())

    for date in DATES:
        date = date.date()

        URL = f"http://api.weatherapi.com/v1/history.json?key={API_KEY}&q={location}&dt={date}"

        response = requests.get(URL)

        with open(f"{date}-{slug}.json", "wt") as fid:
            fid.write(response.text)

        time.sleep(5)

Now retrieve the data.

weather_history("Goje", "KwaZulu-Natal")

Repeat for the other locations.

Data Processing

We’ll need a function for loading the JSON data into R. The data are nested, so we’ll include some code to unwrap and rectangle the data.

library(jsonlite)

prepare_weather <- function(path) {
  weather <- read_json(path)
  
  weather$location %>%
    as_tibble() %>%
    # Drop time fields that relate to data acquisition (download) time.
    select(-starts_with("localtime")) %>%
    mutate(
      hours = weather$forecast$forecastday %>%
        map_dfr(function(day) {
          map_dfr(day$hour, function(hour) {
            hour$condition <- NULL
            hour
          })
        }) %>%
        select(-ends_with("epoch")) %>%
        select(-matches("_(mph|f|in|miles)$")) %>%
        select(-matches("^(will_it|chance_of)_")) %>%
        rename_with(~ sub("_c$", "", .), matches("_c$")) %>%
        rename(precip = precip_mm, pressure=pressure_mb) %>%
        list()
    )
}

Let’s read the data for Goje on 1 August 2021.

(goje <- prepare_weather("2021-08-01-goje-kwazulu-natal.json"))

# A tibble: 1 × 7
  name  region        country        lat   lon tz_id               hours   
  <chr> <chr>         <chr>        <dbl> <dbl> <chr>               <list>  
1 Goje  KwaZulu-Natal South Africa -28.3  31.2 Africa/Johannesburg <tibble>

The hours list column contains the hourly weather data. Let’s take a quick look. We’ll only pull out a few columns that are relevant to the model (there are many more!).

goje %>%
  unnest(cols = hours) %>%
  # Use appropriate time zone when converting to date/time type.
  mutate(time = as.POSIXct(time, "%Y-%m-%d %H:%M", tz = unique(tz_id))) %>%
  select(time, temp, wind_kph, wind_dir, pressure, precip, humidity, cloud)

# A tibble: 24 × 8
   time                 temp wind_kph wind_dir pressure precip humidity cloud
   <dttm>              <dbl>    <dbl> <chr>       <dbl>  <dbl>    <int> <int>
 1 2021-08-01 00:00:00  16.6     17.3 NNE          1026      0       78     0
 2 2021-08-01 01:00:00  16.2     16.7 NNE          1025      0       76     0
 3 2021-08-01 02:00:00  15.9     16.1 NNE          1025      0       74     0
 4 2021-08-01 03:00:00  15.5     15.5 N            1024      0       71     0
 5 2021-08-01 04:00:00  15.6     15   N            1024      0       68     1
 6 2021-08-01 05:00:00  15.6     14.5 N            1024      0       64     2
 7 2021-08-01 06:00:00  15.7     14   N            1023      0       61     2
 8 2021-08-01 07:00:00  16.9     13.3 N            1023      0       55     5
 9 2021-08-01 08:00:00  18.2     12.6 N            1023      0       50     7
10 2021-08-01 09:00:00  19.4     11.9 N            1023      0       45     9
11 2021-08-01 10:00:00  21.4     11.5 NNE          1023      0       42     9
12 2021-08-01 11:00:00  23.5     11.2 NNE          1022      0       38     9
13 2021-08-01 12:00:00  25.5     10.8 NE           1021      0       35     8
14 2021-08-01 13:00:00  25.6     11.8 NE           1020      0       38     6
15 2021-08-01 14:00:00  25.6     12.7 NE           1019      0       40     3
16 2021-08-01 15:00:00  25.7     13.7 ENE          1018      0       43     0
17 2021-08-01 16:00:00  24.5     13.8 ENE          1018      0       48     0
18 2021-08-01 17:00:00  23.3     13.9 NE           1018      0       53     0
19 2021-08-01 18:00:00  22.1     14   NE           1018      0       58     0
20 2021-08-01 19:00:00  21.1     12   ENE          1019      0       60     0
21 2021-08-01 20:00:00  20.1     10   E            1019      0       61     0
22 2021-08-01 21:00:00  19.1      7.9 ESE          1020      0       63     0
23 2021-08-01 22:00:00  19.2      8.8 SSE          1021      0       64     0
24 2021-08-01 23:00:00  19.2      9.6 SSW          1021      0       65     0

We’ll wrap up with a few plots of daily aggregated data. First the total daily precipitation.

And finally the daily temperature (average is solid line and ribbon gives range).

These data are going to be particularly useful for our models.

To leave a comment for the author, please follow the link and comment on their blog: R - datawookie.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)