Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
For this blog post, I decided to try to find a dataset covering an issue I feel quite strongly about – homelessness. I managed to find a fairly large dataset from the Cambridgeshire Insight website.
For a while I’ve wanted to try out R’s mapping potential and hopefully generate a heatmap, so I’ve deliberately tried to find a dataset where I can try this out. It’s worth saying that this activity has been the most difficult and frustrating project I’ve taken on by far. It’s taken me 6 or 7 sessions to produce this blog, in which the first was me trying to install gganimate
(which I ended up not using) and figuring out where to start with mapping.
Data wrangling
Let’s load the required packages and read the data in:
library(tidyverse) ## -- Attaching packages ---------------------------------------------------------------------- tidyverse 1.2.1 -- ## v ggplot2 3.0.0 v purrr 0.2.5 ## v tibble 1.4.2 v dplyr 0.7.6 ## v tidyr 0.8.1 v stringr 1.3.1 ## v readr 1.1.1 v forcats 0.3.0 ## -- Conflicts ------------------------------------------------------------------------- tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() library(gifski) library(sf) ## Linking to GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3 data <- read_csv("http://opendata.cambridgeshireinsight.org.uk/files/ci_opendata/P1E-%20national-%20homelessness-CLG-tab784-to2016_1.csv") ## Parsed with column specification: ## cols( ## .default = col_integer(), ## `ONS code` = col_character(), ## `Local authority area` = col_character(), ## `2009/10 Numbers accepted as homeless and in priority need who are White` = col_character(), ## `2009/10 Numbers accepted as homeless and in priority need who are Black or Black British` = col_character(), ## `2009/10 Numbers accepted as homeless and in priority need who are Asian or Asian British` = col_character(), ## `2009/10 Numbers accepted as homeless and in priority need who are Mixed` = col_character(), ## `2009/10 Numbers accepted as homeless and in priority need who are Other ethnic origin` = col_character(), ## `2009/10 Numbers accepted as homeless and in priority need who are Ethnic Group not Stated` = col_character(), ## `2009/10 Number per 1000 households` = col_double(), ## `2009/10 Total decisions where eligible homeless & in priority need but intentionally` = col_character(), ## `2009/10 Total decisions where eligible & homeless but not in priority need` = col_character(), ## `2009/10 Total decisions where eligible but not homeless` = col_character(), ## `2009/10 Total homelessness decisions` = col_character(), ## `31 March 2010 Total households in B&B (including shared annex)` = col_character(), ## `31 March 2010 Total households in hostels` = col_character(), ## `31 March 2010 Total households in LA/HA stock` = col_character(), ## `31 March 2010 Total households in private sector leased (by LA or HA)` = col_character(), ## `31 March 2010 Total households in other temp (including private landlord)` = col_character(), ## `31 March 2010 Number per 1000 households` = col_double(), ## `2010/11 Number per 1000 households` = col_double() ## # ... with 30 more columns ## ) ## See spec(...) for full column specifications. names(data) ## [1] "ONS code" ## [2] "Local authority area" ## [3] "2009/10 Thousands of households 2006 mid-year estimate" ## [4] "2009/10 Numbers accepted as homeless and in priority need who are White" ## [5] "2009/10 Numbers accepted as homeless and in priority need who are Black or Black British" ## [6] "2009/10 Numbers accepted as homeless and in priority need who are Asian or Asian British" ## [7] "2009/10 Numbers accepted as homeless and in priority need who are Mixed" ## [8] "2009/10 Numbers accepted as homeless and in priority need who are Other ethnic origin" ## [9] "2009/10 Numbers accepted as homeless and in priority need who are Ethnic Group not Stated" ## [10] "2009/10 Numbers accepted as homeless and in priority need total" ## [11] "2009/10 Number per 1000 households" ## [12] "2009/10 Total decisions where eligible homeless & in priority need but intentionally" ## [13] "2009/10 Total decisions where eligible & homeless but not in priority need" ## [14] "2009/10 Total decisions where eligible but not homeless" ## [15] "2009/10 Total homelessness decisions" ## [16] "31 March 2010 Total households in B&B (including shared annex)" ## [17] "31 March 2010 Total households in hostels" ## [18] "31 March 2010 Total households in LA/HA stock" ## [19] "31 March 2010 Total households in private sector leased (by LA or HA)" ## [20] "31 March 2010 Total households in other temp (including private landlord)" ## [21] "31 March 2010 Total households in temporary accommodation" ## [22] "31 March 2010 Number per 1000 households" ## [23] "2009/10 Duty owed but no accommodation has been secured at end of March 2010" ## [24] "2010/11 Thousands of households 2008 mid-year estimate" ## [25] "2010/11 Numbers accepted as homeless and in priority need who are White" ## [26] "2010/11 Numbers accepted as homeless and in priority need who are Black or Black British" ## [27] "2010/11 Numbers accepted as homeless and in priority need who are Asian or Asian British" ## [28] "2010/11 Numbers accepted as homeless and in priority need who are Mixed" ## [29] "2010/11 Numbers accepted as homeless and in priority need who are Other ethnic origin" ## [30] "2010/11 Numbers accepted as homeless and in priority need who are Ethnic Group not Stated" ## [31] "2010/11 Numbers accepted as homeless and in priority need total" ## [32] "2010/11 Number per 1000 households" ## [33] "2010/11 Total decisions where eligible homeless & in priority need but intentionally" ## [34] "2010/11 Total decisions where eligible & homeless but not in priority need" ## [35] "2010/11 Total decisions where eligible but not homeless" ## [36] "2010/11 Total homelessness decisions" ## [37] "31 March 2011 Total households in B&B (including shared annex)" ## [38] "31 March 2011 Total households in hostels" ## [39] "31 March 2011 Total households in LA/HA stock" ## [40] "31 March 2011 Total households in private sector leased (by LA or HA)" ## [41] "31 March 2011 Total households in other temp (including private landlord)" ## [42] "31 March 2011 Total households in temporary accommodation" ## [43] "31 March 2011 Number per 1000 households" ## [44] "2010/11 Duty owed but no accommodation has been secured at end of March 2011" ## [45] "2011/12 Thousands of households 2008 mid-year estimate" ## [46] "2011/12 Numbers accepted as homeless and in priority need who are White" ## [47] "2011/12 Numbers accepted as homeless and in priority need who are Black or Black British" ## [48] "2011/12 Numbers accepted as homeless and in priority need who are Asian or Asian British" ## [49] "2011/12 Numbers accepted as homeless and in priority need who are Mixed" ## [50] "2011/12 Numbers accepted as homeless and in priority need who are Other ethnic origin" ## [51] "2011/12 Numbers accepted as homeless and in priority need who are Ethnic Group not Stated" ## [52] "2011/12 Numbers accepted as homeless and in priority need total" ## [53] "2011/12 Number per 1000 households" ## [54] "2011/12 Total decisions where eligible homeless & in priority need but intentionally" ## [55] "2011/12 Total decisions where eligible & homeless but not in priority need" ## [56] "2011/12 Total decisions where eligible but not homeless" ## [57] "2011/12 Total homelessness decisions" ## [58] "31 March 2012 Total households in B&B (including shared annex)" ## [59] "31 March 2012 Total households in hostels" ## [60] "31 March 2012 Total households in LA/HA stock" ## [61] "31 March 2012 Total households in private sector leased (by LA or HA)" ## [62] "31 March 2012 Total households in other temp (including private landlord)" ## [63] "31 March 2012 Total households in temporary accommodation" ## [64] "31 March 2012 Number per 1000 households" ## [65] "2011/12 Duty owed but no accommodation has been secured at end of March 2012" ## [66] "2012/13 Thousands of households 2008-based interim projections for 2012" ## [67] "2012/13 Numbers accepted as homeless and in priority need who are White" ## [68] "2012/13 Numbers accepted as homeless and in priority need who are Black or Black British" ## [69] "2012/13 Numbers accepted as homeless and in priority need who are Asian or Asian British" ## [70] "2012/13 Numbers accepted as homeless and in priority need who are Mixed" ## [71] "2012/13 Numbers accepted as homeless and in priority need who are Other ethnic origin" ## [72] "2012/13 Numbers accepted as homeless and in priority need who are Ethnic Group not Stated" ## [73] "2012/13 Numbers accepted as homeless and in priority need total" ## [74] "2012/13 Number per 1000 households" ## [75] "2012/13 Total decisions where eligible homeless & in priority need but intentionally" ## [76] "2012/13 Total decisions where eligible & homeless but not in priority need" ## [77] "2012/13 Total decisions where eligible but not homeless" ## [78] "2012/13 Total homelessness decisions" ## [79] "31 March 2013 Total households in B&B (including shared annex)" ## [80] "31 March 2013 Total households in hostels" ## [81] "31 March 2013 Total households in LA/HA stock" ## [82] "31 March 2013 Total households in private sector leased (by LA or HA)" ## [83] "31 March 2013 Total households in other temp (including private landlord)" ## [84] "31 March 2013 Total households in temporary accommodation" ## [85] "31 March 2013 Number per 1000 households" ## [86] "2012/13 Duty owed but no accommodation has been secured at end of March 2013" ## [87] "2013/14 Thousands of households 2012-based interim projections for 2013" ## [88] "2013/14 Numbers accepted as homeless and in priority need who are White" ## [89] "2013/14 Numbers accepted as homeless and in priority need who are Black or Black British" ## [90] "2013/14 Numbers accepted as homeless and in priority need who are Asian or Asian British" ## [91] "2013/14 Numbers accepted as homeless and in priority need who are Mixed" ## [92] "2013/14 Numbers accepted as homeless and in priority need who are Other ethnic origin" ## [93] "2013/14 Numbers accepted as homeless and in priority need who are Ethnic Group not Stated" ## [94] "2013/14 Numbers accepted as homeless and in priority need total" ## [95] "2013/14 Number per 1000 households" ## [96] "2013/14 Total decisions where eligible homeless & in priority need but intentionally" ## [97] "2013/14 Total decisions where eligible & homeless but not in priority need" ## [98] "2013/14 Total decisions where eligible but not homeless" ## [99] "2013/14 Total homelessness decisions" ## [100] "31 March 2014 Total households in B&B (including shared annex)" ## [101] "31 March 2014 Total households in hostels" ## [102] "31 March 2014 Total households in LA/HA stock" ## [103] "31 March 2014 Total households in private sector leased (by LA or HA)" ## [104] "31 March 2014 Total households in other temp (including private landlord)" ## [105] "31 March 2014 Total households in temporary accommodation" ## [106] "31 March 2014 Number per 1000 households" ## [107] "2013/14 Duty owed but no accommodation has been secured at end of March 2014" ## [108] "2014/15 Thousands of households 2012-based interim projections for 2014" ## [109] "2014/15 Numbers accepted as homeless and in priority need who are White" ## [110] "2014/15 Numbers accepted as homeless and in priority need who are Black or Black British" ## [111] "2014/15 Numbers accepted as homeless and in priority need who are Asian or Asian British" ## [112] "2014/15 Numbers accepted as homeless and in priority need who are Mixed" ## [113] "2014/15 Numbers accepted as homeless and in priority need who are Other ethnic origin" ## [114] "2014/15 Numbers accepted as homeless and in priority need who are Ethnic Group not Stated" ## [115] "2014/15 Numbers accepted as homeless and in priority need total" ## [116] "2014/15 Number per 1000 households" ## [117] "2014/15 Total decisions where eligible homeless & in priority need but intentionally" ## [118] "2014/15 Total decisions where eligible & homeless but not in priority need" ## [119] "2014/15 Total decisions where eligible but not homeless" ## [120] "2014/15 Total homelessness decisions" ## [121] "31 March 2015 Total households in B&B (including shared annex)" ## [122] "31 March 2015 Total households in hostels" ## [123] "31 March 2015 Total households in LA/HA stock" ## [124] "31 March 2015 Total households in private sector leased (by LA or HA)" ## [125] "31 March 2015 Total households in other temp (including private landlord)" ## [126] "31 March 2015 Total households in temporary accommodation" ## [127] "31 March 2015 Number per 1000 households" ## [128] "2014/15 Duty owed but no accommodation has been secured at end of March 2015" ## [129] "2015/16 Thousands of households 2012-based interim projections for 2015" ## [130] "2015/16 Numbers accepted as homeless and in priority need who are White" ## [131] "2015/16 Numbers accepted as homeless and in priority need who are Black or Black British" ## [132] "2015/16 Numbers accepted as homeless and in priority need who are Asian or Asian British" ## [133] "2015/16 Numbers accepted as homeless and in priority need who are Mixed" ## [134] "2015/16 Numbers accepted as homeless and in priority need who are Other ethnic origin" ## [135] "2015/16 Numbers accepted as homeless and in priority need who are Ethnic Group not Stated" ## [136] "2015/16 Numbers accepted as homeless and in priority need total" ## [137] "2015/16 Number per 1000 households" ## [138] "2015/16 Total decisions where eligible homeless & in priority need but intentionally" ## [139] "2015/16 Total decisions where eligible & homeless but not in priority need" ## [140] "2015/16 Total decisions where eligible but not homeless" ## [141] "2015/16 Total homelessness decisions" ## [142] "31 March 2016 Total households in B&B (including shared annex)" ## [143] "31 March 2016 Total households in hostels" ## [144] "31 March 2016 Total households in LA/HA stock" ## [145] "31 March 2016 Total households in private sector leased (by LA or HA)" ## [146] "31 March 2016 Total households in other temp (including private landlord)" ## [147] "31 March 2016 Total households in temporary accommodation" ## [148] "31 March 2016 Number per 1000 households" ## [149] "2015/16 Duty owed but no accommodation has been secured at end of March 2015"
The first thing to do is to try to hone in on some data I’d like to use. A quick scan of the columns and the “Local authority area” looks critical, and I’d like to see if I have yearly data for “Numbers accepted as homeless and in priority need total”:
ind <- str_detect(names(data), "priority need total") names(data)[ind] ## [1] "2009/10 Numbers accepted as homeless and in priority need total" ## [2] "2010/11 Numbers accepted as homeless and in priority need total" ## [3] "2011/12 Numbers accepted as homeless and in priority need total" ## [4] "2012/13 Numbers accepted as homeless and in priority need total" ## [5] "2013/14 Numbers accepted as homeless and in priority need total" ## [6] "2014/15 Numbers accepted as homeless and in priority need total" ## [7] "2015/16 Numbers accepted as homeless and in priority need total"
This looks to fit the bill. Now I’ve honed in on the columns I need, let’s have a look at the structure and distribution of the data:
data_trim <- data %>% select(2, names(data)[ind]) str(data_trim, give.attr = FALSE) ## Classes 'tbl_df', 'tbl' and 'data.frame': 327 obs. of 8 variables: ## $ Local authority area : chr "ENGLAND" "Adur" "Allerdale" "Amber Valley" ... ## $ 2009/10 Numbers accepted as homeless and in priority need total: int 40020 71 102 30 52 42 178 93 37 232 ... ## $ 2010/11 Numbers accepted as homeless and in priority need total: int 44160 90 104 46 79 25 194 112 46 221 ... ## $ 2011/12 Numbers accepted as homeless and in priority need total: int 50290 58 63 53 100 16 161 126 78 199 ... ## $ 2012/13 Numbers accepted as homeless and in priority need total: int 53770 37 41 61 129 26 199 133 100 664 ... ## $ 2013/14 Numbers accepted as homeless and in priority need total: int 52290 10 26 64 109 85 166 116 86 853 ... ## $ 2014/15 Numbers accepted as homeless and in priority need total: int 54430 7 30 117 191 87 152 160 86 764 ... ## $ 2015/16 Numbers accepted as homeless and in priority need total: chr "57740" "16" "32" "101" ... summary(data_trim) ## Local authority area ## Length:327 ## Class :character ## Mode :character ## ## ## ## 2009/10 Numbers accepted as homeless and in priority need total ## Min. : 1.0 ## 1st Qu.: 30.0 ## Median : 63.0 ## Mean : 244.8 ## 3rd Qu.: 136.0 ## Max. :40020.0 ## 2010/11 Numbers accepted as homeless and in priority need total ## Min. : 1.0 ## 1st Qu.: 36.5 ## Median : 73.0 ## Mean : 270.1 ## 3rd Qu.: 149.0 ## Max. :44160.0 ## 2011/12 Numbers accepted as homeless and in priority need total ## Min. : 0.0 ## 1st Qu.: 41.0 ## Median : 85.0 ## Mean : 307.6 ## 3rd Qu.: 168.0 ## Max. :50290.0 ## 2012/13 Numbers accepted as homeless and in priority need total ## Min. : 0.0 ## 1st Qu.: 38.0 ## Median : 78.0 ## Mean : 326.4 ## 3rd Qu.: 178.5 ## Max. :53770.0 ## 2013/14 Numbers accepted as homeless and in priority need total ## Min. : 0.0 ## 1st Qu.: 38.5 ## Median : 82.0 ## Mean : 319.8 ## 3rd Qu.: 174.5 ## Max. :52290.0 ## 2014/15 Numbers accepted as homeless and in priority need total ## Min. : 0.0 ## 1st Qu.: 39.0 ## Median : 87.0 ## Mean : 332.9 ## 3rd Qu.: 185.0 ## Max. :54430.0 ## 2015/16 Numbers accepted as homeless and in priority need total ## Length:327 ## Class :character ## Mode :character ## ## ##
I can see that apart from the annoyingly long column names, I seem to have the totals for the whole of England in the first row. So let’s fix these issues:
data_trim <- data_trim %>% slice(-1) %>% set_names("LAA", 2009:2015) head(data_trim, 20) ## # A tibble: 20 x 8 ## LAA `2009` `2010` `2011` `2012` `2013` `2014` `2015` ## <chr> <int> <int> <int> <int> <int> <int> <chr> ## 1 Adur 71 90 58 37 10 7 16 ## 2 Allerdale 102 104 63 41 26 30 32 ## 3 Amber Valley 30 46 53 61 64 117 101 ## 4 Arun 52 79 100 129 109 191 228 ## 5 Ashfield 42 25 16 26 85 87 93 ## 6 Ashford 178 194 161 199 166 152 154 ## 7 Aylesbury Vale 93 112 126 133 116 160 177 ## 8 Babergh 37 46 78 100 86 86 94 ## 9 Barking and Dagenham 232 221 199 664 853 764 941 ## 10 Barnet 232 251 339 595 674 677 422 ## 11 Barnsley 95 56 38 23 14 13 14 ## 12 Barrow-in-Furness 40 26 29 29 19 17 18 ## 13 Basildon 191 232 255 282 302 351 208 ## 14 Basingstoke and Deane 1 1 2 11 22 54 46 ## 15 Bassetlaw 18 27 48 75 41 91 65 ## 16 Bath and North East S~ 68 100 86 86 65 48 68 ## 17 Bedford UA 141 107 211 242 174 164 287 ## 18 Bexley 128 204 346 349 420 498 483 ## 19 Birmingham 3371 4207 3929 3957 3160 3140 3524 ## 20 Blaby 2 7 2 1 0 6 11
That’s looking a bit better. I notice that there seems to be a stray “UA” at the end of some LAAs. From the output of the summary()
function above, I can also see that the 2015/16 column seems to have been parsed as a character, so there’s probably some non-numeric character in there. Let’s see how many places these issues affect:
data_trim %>% filter(str_detect(LAA, " UA")) %>% select(LAA) ## # A tibble: 56 x 1 ## LAA ## <chr> ## 1 Bath and North East Somerset UA ## 2 Bedford UA ## 3 Blackburn with Darwen UA ## 4 Blackpool UA ## 5 Bournemouth UA ## 6 Bracknell Forest UA ## 7 Brighton and Hove UA ## 8 Bristol City of UA ## 9 Central Bedfordshire UA ## 10 Cheshire East UA ## # ... with 46 more rows data_trim %>% filter(str_detect(`2015`, "[^0-9]+")) %>% select(LAA, `2015`) ## # A tibble: 5 x 2 ## LAA `2015` ## <chr> <chr> ## 1 Chorley - ## 2 Eden - ## 3 Hyndburn - ## 4 Isles of Scilly UA - ## 5 Waverley -
56 place names ending in “UA” and five places without data in 2015! Let’s update our trimmed data to fix these issues, and make the data tidy by gathering the year headers into their own column:
data_tidy <- data_trim %>% mutate(LAA = str_replace(LAA, " UA", "")) %>% mutate(`2015` = str_replace(`2015`, "-", NA_character_) %>% as.integer()) %>% gather(year, num_homeless, -LAA) %>% mutate(year = as.integer(year)) str(data_tidy) ## Classes 'tbl_df', 'tbl' and 'data.frame': 2282 obs. of 3 variables: ## $ LAA : chr "Adur" "Allerdale" "Amber Valley" "Arun" ... ## $ year : int 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 ... ## $ num_homeless: int 71 102 30 52 42 178 93 37 232 232 ...
Initial analysis
Now I have the data in a more manageable format, let’s quickly plot the top 6 homelessness figures in each year:
data_tidy %>% group_by(year) %>% arrange(year, desc(num_homeless)) %>% top_n(6) %>% ggplot(aes(x = LAA, y = num_homeless)) + geom_bar(stat = "identity") + coord_flip() + facet_wrap(~ year, ncol=2, scales="free_y") ## Selecting by num_homeless
We can see that Birmingham is by far the worst offender. I’m not sure of the accuracy of these figures, but if true that is truly horrifying and it hadn’t seemed to have got any better up to 2015. Which areas have seen the most drastic improvement/deterioration over the 7 years?:
extremes <- data_tidy %>% drop_na() %>% filter(year %in% c(2009, 2015)) %>% group_by(LAA) %>% mutate(homeless2009 = lag(num_homeless), change = num_homeless - homeless2009) %>% ungroup() %>% drop_na() %>% arrange(change) bind_rows(head(extremes, 8), tail(extremes, 8)) ## # A tibble: 16 x 5 ## LAA year num_homeless homeless2009 change ## <chr> <int> <int> <int> <int> ## 1 Sheffield 2015 421 946 -525 ## 2 Coventry 2015 129 538 -409 ## 3 North Tyneside 2015 149 502 -353 ## 4 Derby 2015 28 321 -293 ## 5 Croydon 2015 222 425 -203 ## 6 Durham 2015 70 264 -194 ## 7 Cornwall 2015 250 419 -169 ## 8 Tower Hamlets 2015 522 690 -168 ## 9 Craven 2015 560 8 552 ## 10 Milton Keynes 2015 789 84 705 ## 11 Barking and Dagenham 2015 941 232 709 ## 12 Bristol City of 2015 1006 285 721 ## 13 Waltham Forest 2015 1087 286 801 ## 14 Enfield 2015 1131 241 890 ## 15 Dacorum 2015 1006 14 992 ## 16 Newham 2015 1345 97 1248
Sheffield was the most improved with a reduction of over 500, with Newham seeing a massive increase of over 1200.
The painful part
So having never done any geospatial analysis or mapping before, I tried doing some Google searches to see if I could find any code I could use. I quickly discovered that if I was going to do any mapping of UK regions, I was going to need to access some shape files.
I managed to download some from the UK Data Service website. I also had enormous trouble getting the function to read the data from within this blog post, but I managed to make it work using the here
package, which I’ve since heard good things about on Twitter.
shapes <- st_read(dsn = paste(here::here(),"./data/homelessness/BoundaryData", sep="/"), layer = "infuse_dist_lyr_2011") %>% arrange(name) ## Reading layer `infuse_dist_lyr_2011' from data source `C:\Users\J\Documents\r-house\data\homelessness\BoundaryData' using driver `ESRI Shapefile' ## Simple feature collection with 324 features and 5 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 82643.6 ymin: 5333.602 xmax: 655989 ymax: 657599.5 ## epsg (SRID): NA ## proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs str(shapes) ## Classes 'sf' and 'data.frame': 324 obs. of 6 variables: ## $ name : Factor w/ 324 levels "Adur","Allerdale",..: 1 2 3 4 5 6 7 8 9 10 ... ## $ label : Factor w/ 324 levels "E92000001E06000001",..: 243 64 70 244 195 136 55 220 292 293 ... ## $ geo_labelw: Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ... ## $ geo_label : Factor w/ 324 levels "Adur","Allerdale",..: 1 2 3 4 5 6 7 8 9 10 ... ## $ geo_code : Factor w/ 324 levels "E06000001","E06000002",..: 243 64 70 244 195 136 55 220 292 293 ... ## $ geometry :sfc_MULTIPOLYGON of length 324; first list element: List of 1 ## ..$ :List of 1 ## .. ..$ : num [1:2718, 1:2] 515970 515951 515901 515901 515855 ... ## ..- attr(*, "class")= chr "XY" "MULTIPOLYGON" "sfg" ## - attr(*, "sf_column")= chr "geometry" ## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA ## ..- attr(*, "names")= chr "name" "label" "geo_labelw" "geo_label" ...
With the intent of joining my dataframes together, I identified an inconsistency in the areas given in each table (diff()
is a very handy function!):
n_distinct(data_tidy$LAA) ## [1] 326 n_distinct(shapes$name) ## [1] 324 data_diff <- setdiff(data_tidy$LAA, shapes$name) shapes_diff <- setdiff(shapes$name, data_tidy$LAA) data_frame(data = data_diff, shapes = c(shapes_diff,"","")) ## # A tibble: 11 x 2 ## data shapes ## <chr> <chr> ## 1 Bristol City of Bristol, City of ## 2 City of London City of London,Westminster ## 3 Cornwall Cornwall,Isles of Scilly ## 4 Durham County Durham ## 5 Herefordshire County of Herefordshire, County of ## 6 Isles of Scilly Kingston upon Hull, City of ## 7 Kingston upon Hull City of St Albans ## 8 St Helens St Edmundsbury ## 9 St. Albans St. Helens ## 10 St. Edmundsbury "" ## 11 Westminster ""
You can see from the output above that my homelessness data has split out Westminster from the City of London, and the Isles of Scilly from Cornwall. There are also some grammatical inconsistencies that need to be sorted out. Let’s clean it up, by combining rows
data_final <- data_tidy %>% #mutate_at(vars("year", "num_homeless"), as.numeric) %>% mutate(LAA = ifelse(LAA %in% c("City of London","Westminster"), "City of London,Westminster", LAA)) %>% mutate(LAA = ifelse(LAA %in% c("Cornwall","Isles of Scilly"), "Cornwall,Isles of Scilly", LAA)) %>% mutate(LAA = ifelse(LAA == "Bristol City of","Bristol, City of",LAA)) %>% mutate(LAA = ifelse(LAA == "Durham","County Durham",LAA)) %>% mutate(LAA = ifelse(LAA == "Herefordshire County of","Herefordshire, County of",LAA)) %>% mutate(LAA = ifelse(LAA == "Kingston upon Hull City of","Kingston upon Hull, City of",LAA)) %>% mutate(LAA = ifelse(LAA == "St Helens","St. Helens",LAA)) %>% mutate(LAA = ifelse(LAA == "St. Albans","St Albans",LAA)) %>% mutate(LAA = ifelse(LAA == "St. Edmundsbury","St Edmundsbury",LAA)) %>% mutate(LAA = as.factor(LAA)) %>% group_by(LAA, year) %>% summarise(total_homeless = sum(num_homeless)) %>% ungroup()
Next, I created a function to take a year and a set of regions and generate a heatmap. This function filters the homelessness data, joins it with the shape data, and then plots the data. I’ve included regions
as an argument so that Birmingham can be filtered out, as it dominates the heatmap.
heatmap <- function(inp_year, regions) { data_joined <- data_final %>% filter(year==inp_year) %>% filter(LAA %in% regions) %>% right_join(shapes, by = c("LAA"="name")) max_scale <- max(data_final %>% filter(LAA %in% regions) %>% select(total_homeless), na.rm=TRUE) p <- ggplot() + geom_sf(data=data_joined, aes(fill=total_homeless), col="black") + theme_void() + coord_sf(datum=NA) + scale_fill_viridis_c(name = NULL, option = "magma", limits = c(0, max_scale), breaks = c(0, max_scale/2, max_scale)) + labs(title = paste0("Total number of people accepted as homeless and in priority need in England in ",inp_year), caption = "Data obtained from http://opendata.cambridgeshireinsight.org.uk/dataset/homelessness-england") print(p) } regions_to_include <- unique(setdiff(data_final$LAA, "Birmingham")) save_gif(walk(min(data_final$year):max(data_final$year), heatmap, regions = regions_to_include), delay = 0.7, gif_file = "animation.gif")
I certainly feel this project has been a bit of a hack job. It’s taken me over a month to write because it’s been so challenging and I’ve had to leave and come back to it so many times. I’m not proud of it, mainly because I rushed it at the end because I just wanted it done.
I’ve since used Tableau, and that seems a bit easier to do heatmaps. If I were to do it again in R however, I think I’ll be taking the courses on DataCamp first!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.