Historical Weather Data
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
![](https://i0.wp.com/datawookie.dev/blog/2022/08/historical-weather-data/historical-weather-data.jpg?w=450&ssl=1)
I’m building a model which requires historical weather data from a selection of locations in South Africa. In this post I demonstrate the process of acquiring the data and doing some simple processing.
I need data for three locations: Brookes and Goje (in KwaZulu-Natal) and Hlangalane (in the Eastern Cape).
# A tibble: 3 × 4 name region lat lon <chr> <chr> <dbl> <dbl> 1 Brookes KwaZulu-Natal -29.6 29.8 2 Goje KwaZulu-Natal -28.3 31.2 3 Hlangalane Eastern Cape -31.0 28.6
Here are those locations on a map. They are sufficiently far apart that we would expect them to have different weather histories.
![](https://i0.wp.com/datawookie.dev/blog/2022/08/historical-weather-data/index_files/figure-html/unnamed-chunk-2-1.png?w=450&ssl=1)
Data Acquisition
I’m getting the data using Weather API. The business plan gives me access to data going back to the beginning of 2010. I like to mix things up, so I’ll hit the API from Python and then use R to do the processing.
The API key is stored in an environment variable.
import os API_KEY = os.getenv("WEATHER_API_KEY")
Define the date range.
import pandas as pd DATE_MIN = "2021-08-01" DATE_MAX = "2022-08-01" DATES = pd.date_range(start=DATE_MIN, end=DATE_MAX)
Create a function for retrieving the data and writing it to a file. There will be one JSON file per location and date.
import re import requests def weather_history(name, region): location = name+", "+region slug = re.sub("[, ]+", "-", location.lower()) for date in DATES: date = date.date() URL = f"http://api.weatherapi.com/v1/history.json?key={API_KEY}&q={location}&dt={date}" response = requests.get(URL) with open(f"{date}-{slug}.json", "wt") as fid: fid.write(response.text) time.sleep(5)
Now retrieve the data.
weather_history("Goje", "KwaZulu-Natal")
Repeat for the other locations.
Data Processing
We’ll need a function for loading the JSON data into R. The data are nested, so we’ll include some code to unwrap and rectangle the data.
library(jsonlite) prepare_weather <- function(path) { weather <- read_json(path) weather$location %>% as_tibble() %>% # Drop time fields that relate to data acquisition (download) time. select(-starts_with("localtime")) %>% mutate( hours = weather$forecast$forecastday %>% map_dfr(function(day) { map_dfr(day$hour, function(hour) { hour$condition <- NULL hour }) }) %>% select(-ends_with("epoch")) %>% select(-matches("_(mph|f|in|miles)$")) %>% select(-matches("^(will_it|chance_of)_")) %>% rename_with(~ sub("_c$", "", .), matches("_c$")) %>% rename(precip = precip_mm, pressure=pressure_mb) %>% list() ) }
Let’s read the data for Goje on 1 August 2021.
(goje <- prepare_weather("2021-08-01-goje-kwazulu-natal.json")) # A tibble: 1 × 7 name region country lat lon tz_id hours <chr> <chr> <chr> <dbl> <dbl> <chr> <list> 1 Goje KwaZulu-Natal South Africa -28.3 31.2 Africa/Johannesburg <tibble>
The hours
list column contains the hourly weather data. Let’s take a quick look. We’ll only pull out a few columns that are relevant to the model (there are many more!).
goje %>% unnest(cols = hours) %>% # Use appropriate time zone when converting to date/time type. mutate(time = as.POSIXct(time, "%Y-%m-%d %H:%M", tz = unique(tz_id))) %>% select(time, temp, wind_kph, wind_dir, pressure, precip, humidity, cloud) # A tibble: 24 × 8 time temp wind_kph wind_dir pressure precip humidity cloud <dttm> <dbl> <dbl> <chr> <dbl> <dbl> <int> <int> 1 2021-08-01 00:00:00 16.6 17.3 NNE 1026 0 78 0 2 2021-08-01 01:00:00 16.2 16.7 NNE 1025 0 76 0 3 2021-08-01 02:00:00 15.9 16.1 NNE 1025 0 74 0 4 2021-08-01 03:00:00 15.5 15.5 N 1024 0 71 0 5 2021-08-01 04:00:00 15.6 15 N 1024 0 68 1 6 2021-08-01 05:00:00 15.6 14.5 N 1024 0 64 2 7 2021-08-01 06:00:00 15.7 14 N 1023 0 61 2 8 2021-08-01 07:00:00 16.9 13.3 N 1023 0 55 5 9 2021-08-01 08:00:00 18.2 12.6 N 1023 0 50 7 10 2021-08-01 09:00:00 19.4 11.9 N 1023 0 45 9 11 2021-08-01 10:00:00 21.4 11.5 NNE 1023 0 42 9 12 2021-08-01 11:00:00 23.5 11.2 NNE 1022 0 38 9 13 2021-08-01 12:00:00 25.5 10.8 NE 1021 0 35 8 14 2021-08-01 13:00:00 25.6 11.8 NE 1020 0 38 6 15 2021-08-01 14:00:00 25.6 12.7 NE 1019 0 40 3 16 2021-08-01 15:00:00 25.7 13.7 ENE 1018 0 43 0 17 2021-08-01 16:00:00 24.5 13.8 ENE 1018 0 48 0 18 2021-08-01 17:00:00 23.3 13.9 NE 1018 0 53 0 19 2021-08-01 18:00:00 22.1 14 NE 1018 0 58 0 20 2021-08-01 19:00:00 21.1 12 ENE 1019 0 60 0 21 2021-08-01 20:00:00 20.1 10 E 1019 0 61 0 22 2021-08-01 21:00:00 19.1 7.9 ESE 1020 0 63 0 23 2021-08-01 22:00:00 19.2 8.8 SSE 1021 0 64 0 24 2021-08-01 23:00:00 19.2 9.6 SSW 1021 0 65 0
We’ll wrap up with a few plots of daily aggregated data. First the total daily precipitation.
![](https://i2.wp.com/datawookie.dev/blog/2022/08/historical-weather-data/index_files/figure-html/unnamed-chunk-8-1.png?w=450&ssl=1)
And finally the daily temperature (average is solid line and ribbon gives range).
![](https://i0.wp.com/datawookie.dev/blog/2022/08/historical-weather-data/index_files/figure-html/unnamed-chunk-9-1.png?w=450&ssl=1)
These data are going to be particularly useful for our models.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.