Analyzing Trends in Heating and Cooling Degree days using R

Andy Pickering

20 hours ago

[This article was first published on Andy Pickering, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

< section id="introduction" class="level1">

Introduction

Degree days are useful as a measure of building heating and cooling demands. A degree day is calculated as the difference between the average temperature (the average of the high and low temperature for the day) and a reference temperature (in the US 65°F is used). For example, if the average temperature today is 40°F, that would be 25 heating degree days (HDD). A summer day with an average temperature of 85°F would have 20 cooling degree days (CDD). Degree days are usually well correlated with the amount of energy used to heat or cool a home.

I was interested in obtaining and analyzing degree day data; in particular I wanted to see if there were any noticeable trends over time. Given an overall increase in earth’s average temperature due to climate change, I would hypothesize that there might be an overall increase in CDD and a decrease in HDD.

Changes in heating or cooling degree days would have implications for the amount of energy needed in the future to heat and cool residential or commercial buildings, resulting changes in demand on the electric grid, and implications for related carbon emissions (either for the power grid or from burning fossil fuels to heat buildings).

< section id="data" class="level1">

Data

I obtained heating and cooling degree day data from the U.S. Energy Information Administration for the US. Note these data are weighted by population, see details of how the EIA data are calculated here.

< details open="">< summary>Code

suppressPackageStartupMessages(library(tidyverse))
ggplot2::theme_set(theme_grey(base_size = 15))
suppressPackageStartupMessages(library(janitor))
library(broom)
library(DT)
suppressPackageStartupMessages(library(plotly))

I’ll only use years we have complete data for (1997-2022).

< details open="">< summary>Code

region <- "u_s"

dd <- read_csv(paste0('data/EIA_DegreeDays_Monthly_',region,'.csv'), 
               skip = 4,
               show_col_types = FALSE) |>
  janitor::clean_names() |>
  rename(CDD = paste0('cooling_degree_days_',region,
                      '_cooling_degree_days_cooling_degree_days')) |>
  rename(HDD = paste0('heating_degree_days_',region,
                      '_heating_degree_days_heating_degree_days')) |>
  mutate(date = lubridate::my(month)) |>
  select(-month) |>
  mutate(month = lubridate::month(date)) |>
  mutate(year = lubridate::year(date)) |>
  filter(year > 1996, year < 2023) # keep only complete years


dd |>
  DT::datatable(options = list(pageLength = 5), rownames = FALSE)

I’ll also make a dataframe of the yearly totals:

< details open="">< summary>Code

dd_yearly <- dd |>
  filter(year > 1996) |>
  group_by(year) |>
  summarise(HDD = sum(HDD, na.rm = TRUE),
            CDD = sum(CDD, na.rm = TRUE)
            )

dd |>
  DT::datatable(options = list(pageLength = 5), rownames = FALSE)

< section id="analysis" class="level1">

Analysis

< section id="heating-degree-days" class="level2">

Heating Degree Days

Figure 1 shows the distribution of US heating degree days for each month. Not surprisingly HDD tends to be higher in winter months, although there is a decent amount of variability between years.

< section id="hdd-per-month" class="level3">

HDD Per Month

< details open="">< summary>Code

dd |>
  mutate(month_name = lubridate::month(date, label = TRUE)) |>
  ggplot(aes(month_name, HDD, group = month_name)) +
  geom_boxplot() +
  labs(title = 'Monthly Heating Degree Days for US (1997-2022)',
       x = 'Month',
       y = "Heating Degree Days")

Figure 1: Boxplot of US heating degree days for each month

< section id="trends-in-hdd" class="level3">

Trends in HDD

Is there a trend in HDD over time? I would expect that HDD might decrease over time due to climate change and the increase in earth’s average temperature.

< section id="annual" class="level4">

Annual

Figure 2 shows a timeseries of the annual total heating degree days in the US, along with a linear regression line that shows a negative trend.

< details open="">< summary>Code

g <- dd_yearly |>
  ggplot(aes(year, HDD)) +
  geom_point(size = 4, alpha = 0.5) +
  geom_smooth(method = 'lm', formula = 'y~x') +
  labs(title = "US Annual Heating Degree Days")

g
#plotly::ggplotly(g)

There is a fair bit of variability, but looking at the fit metrics shows that the negative trend in HDD is statistically significant (p-value < 0.05):

< details open="">< summary>Code

hdd_yearly_fit <- lm(data = dd_yearly, formula = 'HDD ~ year')
broom::tidy(hdd_yearly_fit) |> mutate_if(is.numeric,~round(.x,digits = 3))

# A tibble: 2 × 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)  30788.   11849.        2.60   0.016
2 year           -13.2      5.90     -2.24   0.035

< details open="">< summary>Code

broom::glance(hdd_yearly_fit) |> mutate_if(is.numeric,~round(.x,digits = 3))

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.173         0.138  225.      5.01   0.035     1  -177.  359.  363.
# ℹ 3 more variables: deviance <dbl>, df.residual <dbl>, nobs <dbl>

< section id="monthly" class="level4">

Monthly

We have seen that there is a negative trend in annual HDD; what are the trends for individual months? Figure 3 shows timeseries of monthly HDD vs year for winter months, with linear regression lines plotted over them. Visually there appears to be a negative trend for some of the months.

< details open="">< summary>Code

dd |>
  filter(month %in% c(11,12,1,2,3,4)) |>
  mutate(month_name = lubridate::month(date, label = TRUE)) |>
  ggplot(aes(year, HDD, group = month_name)) +
  geom_point(size = 3, alpha = 0.5) +
  geom_smooth(method = 'lm', formula = 'y ~ x') +
  facet_wrap('month_name', scales = 'free') +
  guides(x =  guide_axis(angle = 45))

To better quantify these trends I want to fit a linear regression to the data for each month and examine the results. This could be done with a for loop, but I will take advantage of a nice nested workflow using the tidyr (Wickham, Vaughan, and Girlich 2023), *broom*(Robinson, Hayes, and Couch 2023), and purrr (Wickham and Henry 2023) packages.

< details open="">< summary>Code

dd_fit_hdd <- dd |>
  group_by(month) |>
  nest() |>
  mutate(fit = map(data, ~ lm(HDD ~ year, data = .x) ),
         tidied = map(fit, broom::tidy),
         glanced = map(fit, broom::glance)
  ) %>%
  unnest(tidied) |>
  ungroup()

dd_fit_hdd |>
  mutate_if(is.numeric,~round(.x,digits = 3)) |>
  DT::datatable(rownames = FALSE, options = list(pageLength = 5))

To leave a comment for the author, please follow the link and comment on their blog: Andy Pickering.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.