Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
One of the historic difficulties of doing research on urban energy systems has been the limited availability of data at sufficiently detailed spatial resolutions. Without this data, you might end up relying on aggregate information about the built environment, building occupants, and local geography that doesn't apply to the specifics of a particular neighbourhood or street.
Fortunately things are gradually improving and the UK government has a major initiative in this area: NEED, the National Energy Efficiency Data Framework. The project website has recently been updated with a series of data sets, mainly consisting of cross-tabulated statistics comparing insulation, dwelling size, income, dwelling age, and other factors.
I haven't had a chance to do anything meaningful with this data, but I thought a good starting point would be to write some functions to extract the necessary information from the provided Excel spreadsheets. So the analysis is purely exploratory, examining electricity and gas assumption measured at the level of individual English local authorities and grouped by dwelling floor area. The full code for parsing the data and making the plots is available at the bottom of the post, but here’s the main result.
As you would expect, both the mean and the variance of gas consumption increase with larger dwellings. In electricity consumption, the effect is less noticeable with less distinction between consumption levels in small dwellings.
Let's hope that DECC makes more NEED data available quickly.
Full code on Github:
## Analysis of DECC NEED data ## James Keirstead ## 22 November 2012 ## ## Underlying data available from ## http://www.decc.gov.uk/en/content/cms/statistics/energy_stats/en_effic_stats/need/need.aspx ## For more info, see ## http://www.jameskeirstead.ca/blog/we-need-more-data/ ## @knitr define-functions ## Define useful global variables ## ------------------------------ file_name <- "data/6951-local-authority-consumption-tables-2010-1.xls" floor_area_labels <- c("50 or less", "51 to 100", "101 to 150", "151 to 200", "201 to 250", "Over 250") ### Define functions ## ------------------------------ ## Remove regional totals ## ## The spreadsheets contain subtotals by region and country. ## However since I want to analyse the information by local authority ## I first need to strip out these extra rows. ## df - the input data frame. It should contain the local authority ## names in the second column ("Col1") remove_regional_totals <- function(df) { ## The regional names are kept in Col0 and therefore Col1 is blank tmp <- subset(df, !is.na(Col1)) ## Remove whitespace from local authority names require(stringr) tmp <- mutate(tmp, Col1=str_trim(Col1)) ## Return the result, dropping the now empty first column return(tmp[,-1]) } ## Extract columns ## ## The data are arranged in columns with discrete categories. ## This function will extract a generic set of columns and apply a new set ## of labels ## ## df - the input data frame with local authority name in first column ## startCol - the number of the first column to be selected ## labels - the labels of the extracted columns, e.g. c("Apples", "Oranges", "Pears") ## category - the generic category of the labels, e.g. "Fruit" ## melt - return the columns as a melted data frame extract_columns <- function(df, startCol, labels, category, do_melt=TRUE) { ## Extract the columns and rename tmp <- df[, c(1, startCol:(startCol + length(labels) - 1))] names(tmp) <- c("LAU", labels) if (do_melt) { require(reshape2) tmp.m <- melt(tmp, id="LAU", variable.name=category) # Remove DECC's placeholder for confidential info require(stringr) tmp.m <- mutate(tmp.m, value=as.numeric(str_replace(value, "x", NA))) return(tmp.m) } else { return(tmp) } } ## Extract the number of dwellings in each local authority by floor area get_number_dwellings <- function(df) { tmp <- extract_columns(df, 3, floor_area_labels, "floor") names(tmp)[3] <- "number" return(tmp) } ## Extract the mean energy consumption in each local authority by floor area ## Both the gas and electricity sheets position this information in the same column get_mean_consumption <- function(df) { tmp <- extract_columns(df, 12, floor_area_labels, "floor") names(tmp)[3] <- "energy" return(tmp) } ## A convenience function to assemble the overall data set clean_data <- function(df) { tmp <- remove_regional_totals(df) mean_energy <- get_mean_consumption(tmp) dwellings <- get_number_dwellings(tmp) result <- merge(mean_energy, dwellings) return(result) } ## @knitr run-analysis ## Run the analysis ##------------------------ ## Read in the spreadsheet ## Start and end coordinates are by manual inspection require(XLConnect) ## Process the gas data first raw_gas_data <- readWorksheetFromFile(file_name, "LAG1", startRow=10, startCol=which(LETTERS=="A"), endRow=432, endCol=which(LETTERS=="S")) gas_data <- clean_data(raw_gas_data) ## Then the electricity data raw_elec_data <- readWorksheetFromFile(file_name, "LAE1", startRow=10, startCol=which(LETTERS=="A"), endRow=432, endCol=which(LETTERS=="S")) elec_data <- clean_data(raw_elec_data) ## Merge the two data sets, renaming columns first df.gas <- summarize(gas_data, LAU=LAU, floor=floor, gas=energy) df.elec <- summarize(elec_data, LAU=LAU, floor=floor, elec=energy) data <- merge(df.gas, df.elec) data <- melt(data, id=c('LAU', 'floor'), variable.name='fuel') data <- mutate(data, fuel=factor(fuel, lev=c('elec','gas'), labels=c('Electricity', 'Gas'))) ## @knitr make-plots require(ggplot2) gg <- ggplot(data, aes(x=floor, y=value)) + geom_boxplot() + facet_wrap(~ fuel, ncol=1, scale='free') + labs(x="Floor area (m2)", y="Mean annual energy consumption (kWh)") + theme_bw() print(gg)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.