Electricity demand data in tsibble format
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The tsibbledata
packages contains the vic_elec
data set, containing half-hourly electricity demand for the state of Victoria, along with corresponding temperatures from the capital city, Melbourne. These data cover the period 2012-2014.
Other similar data sets are also available, and these may be of interest to researchers in the area.
For people new to tsibbles, please read my introductory post.
Australian state-level demand
The rawdata for other states are also stored in the tsibbledata
github repository (under the data-raw folder), but these are not included in the package to satisfy CRAN space constraints. However, anyone can still load and use the data with the following code.
library(tidyverse) library(lubridate) library(tsibble) repo <- "https://raw.githubusercontent.com/tidyverts/tsibbledata/master/data-raw/vic_elec/" states <- c("NSW","QLD","SA","TAS","VIC") dirs <- paste0(repo, states, "2015") # Read holidays data holidays <- paste0(dirs,"/holidays.txt") %>% as.list() %>% map_dfr(read_csv, col_names=FALSE, .id="State") %>% transmute( State = states[as.numeric(State)], Date = dmy(X1), Holiday = TRUE ) # Read temperature data temperatures <- paste0(dirs,"/temperature.csv") %>% as.list() %>% map_dfr(read_csv, .id = "State") %>% mutate( State = states[as.numeric(State)], Date = as_date(Date, origin = ymd("1899-12-30")) ) # Read demand data demands <- paste0(dirs,"/demand.csv") %>% as.list() %>% map_dfr(read_csv, .id = "State") %>% mutate( State = states[as.numeric(State)], Date = as_date(Date, origin = ymd("1899-12-30")) ) # Join demand, temperatures and holidays aus_elec <- demands %>% left_join(temperatures, by = c("State", "Date", "Period")) %>% transmute( State, Time = as.POSIXct(Date + minutes((Period-1) * 30)), Period, Date = as_date(Time), DOW = wday(Date, label=TRUE), Demand = OperationalLessIndustrial, Temperature = Temp, ) %>% left_join(holidays, by = c("State", "Date")) %>% replace_na(list(Holiday = FALSE)) # Remove duplicates and create a tsibble aus_elec <- aus_elec %>% filter(!are_duplicated(aus_elec, index=Time, key=State)) %>% as_tsibble(index = Time, key=State)
This block of code reads in raw data files containing holiday information, temperatures and electricity demand for each state, and then joins them into a single tsibble. For some reason, there are duplicated rows from South Australia, so the last few lines removes the duplicates before forming a tsibble, keyed by State.
aus_elec ## # A tsibble: 1,155,408 x 8 [30m] <UTC> ## # Key: State [5] ## State Time Period Date DOW Demand Temperature ## <chr> <dttm> <dbl> <date> <ord> <dbl> <dbl> ## 1 NSW 2002-01-01 00:00:00 1 2002-01-01 Tue 5714. 26.3 ## 2 NSW 2002-01-01 00:30:00 2 2002-01-01 Tue 5360. 26.3 ## 3 NSW 2002-01-01 01:00:00 3 2002-01-01 Tue 5015. 26.3 ## 4 NSW 2002-01-01 01:30:00 4 2002-01-01 Tue 4603. 26.3 ## 5 NSW 2002-01-01 02:00:00 5 2002-01-01 Tue 4285. 26.3 ## 6 NSW 2002-01-01 02:30:00 6 2002-01-01 Tue 4075. 26.3 ## 7 NSW 2002-01-01 03:00:00 7 2002-01-01 Tue 3943. 26.3 ## 8 NSW 2002-01-01 03:30:00 8 2002-01-01 Tue 3884. 26.3 ## 9 NSW 2002-01-01 04:00:00 9 2002-01-01 Tue 3878. 26.3 ## 10 NSW 2002-01-01 04:30:00 10 2002-01-01 Tue 3838. 26.3 ## # … with 1,155,398 more rows, and 1 more variable: Holiday <lgl>
This data set contains half-hourly data from all states from 1 January 2002 – 1 March 2015 (and in the case of Queensland to 1 April 2015). The temperature variable is from a weather station in the capital city of each state.
GEFCOM 2017
The Global Energy Forecasting Competition in 2017 involved data on hourly zonal loads of ISO New England from March 2003 to April 2017. The data have already been packaged into tibble format by Cameron Roach in the gefcom2017data Github repository. So it is relatively easy to convert this to a tsibble.
devtools::install_github("camroach87/gefcom2017data") library(gefcom2017data) gefcom2017 <- gefcom %>% ungroup() %>% as_tsibble(key=zone, index=ts) gefcom2017 ## # A tsibble: 1,241,710 x 15 [1h] <UTC> ## # Key: zone [10] ## ts zone demand drybulb dewpnt date year month ## <dttm> <chr> <dbl> <dbl> <dbl> <date> <dbl> <fct> ## 1 2003-03-01 00:00:00 CT 3386 25 19 2003-03-01 2003 Mar ## 2 2003-03-01 01:00:00 CT 3258 23 18 2003-03-01 2003 Mar ## 3 2003-03-01 02:00:00 CT 3189 22 18 2003-03-01 2003 Mar ## 4 2003-03-01 03:00:00 CT 3157 22 19 2003-03-01 2003 Mar ## 5 2003-03-01 04:00:00 CT 3166 23 19 2003-03-01 2003 Mar ## 6 2003-03-01 05:00:00 CT 3255 23 20 2003-03-01 2003 Mar ## 7 2003-03-01 06:00:00 CT 3430 24 20 2003-03-01 2003 Mar ## 8 2003-03-01 07:00:00 CT 3684 24 20 2003-03-01 2003 Mar ## 9 2003-03-01 08:00:00 CT 3977 25 21 2003-03-01 2003 Mar ## 10 2003-03-01 09:00:00 CT 4129 27 22 2003-03-01 2003 Mar ## # … with 1,241,700 more rows, and 7 more variables: hour <dbl>, ## # day_of_week <fct>, day_of_year <dbl>, weekend <lgl>, ## # holiday_name <chr>, holiday <lgl>, trend <dbl>
Details of the data (and the competition) are available on Tao Hong’s website.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.