52vis Week 2 Challenge — Australian Version

[This article was first published on R – Irregularly Scheduled Programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I mapped out the USA homelessness rate in my last post as a challenge and noted at the end that it would be interesting to do the same for Australia. That was the first comment I received in person, too. “Let’s do it!” I said. What I found may shock you (click-bait title; check).

Most of the code carried over. Of course, lacking hrbrmstr’s neat albersusa equivalent I had to obtain and process the shapefile myself. Thankfully, the ABS have me covered. Here’s the whole script;

## 52vis challenges, week 2 -- adapted for Australian statistics
## otherwise follows procedures found at
## github.com/jonocarroll/2016-14
## http://jcarroll.com.au/2016/04/10/52vis-week-2-challenge/
## this version blogged @ http://jcarroll.com.au/2016/04/12/australian-homeless/
## This script produces a chloropleth for the Australian homeless population
## as a per mille proportion of each state's population, with the
## colorscale set to the same as the USA homeless graph (white at the national
## median, blue at the lowest value, and capped at 3x the national median in red).
## The graph is repeated with a scale more suitable for the Australian statistics.
## The yearly data is looped over in a .gif
## load relevant packages
pacman::p_load(magrittr, dplyr, tidyr, ggplot2, httr, readxl, purrr)
pacman::p_load(data.table, maptools, broom, ggthemes, ggalt, viridis, httr)
pacman::p_load_gh("hrbrmstr/albersusa")
pacman::p_load_gh("dgrtwo/gganimate")
pacman::p_load_gh("hrbrmstr/ggalt")
## and new folder created
setwd("jonocarroll")
## load the data from abs.gov.au (download once)
## http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/2049.02011?OpenDocument
# URL <- "http://abs.gov.au/AUSSTATS/subscriber.nsf/log?openagent&20490do001_2011.xls&2049.0&Data%20Cubes&4B192F075234A583CA257AB1001709B0&0&2011&12.11.2012&Previous"
# GET(URL, write_disk("homeless_AUS.xls", overwrite=TRUE))
## load the sheets into a list (by sheet name, a.k.a. year) of data.frames via map
AUSdata <- read_excel("homeless_AUS.xls", sheet="Table_1", skip=5)[27:35,] %>% setNames(c("State",
"n2001", "pc2001", "p10k2001",
"n2006", "pc2006", "p10k2006",
"n2011", "pc2011", "p10k2011"))
## gather, and add a year column
AUSdataDF <- AUSdata %>%
gather(Year, nHomeless, c(n2001, n2006, n2011)) %>%
dplyr::select(-c(pc2001, pc2006, pc2011, p10k2001, p10k2006, p10k2011)) %>%
mutate(Year=as.integer(substr(Year,2,5)))
## save a copy so we don't have to do that again
save(AUSdataDF, file="AUSdata_data.frame_2016-14.RData")
## merge with abs.gov.au statep opulation statistics
## http://blog.id.com.au/2012/population/australian-population/australian-2011-census-based-population-estimates/
auspop <- read.csv("auspop.csv", stringsAsFactors=FALSE)
auspop %<>% gather(Year, Population, -State) %>% mutate(Year=as.integer(substr(Year,2,5)))
AUSdataDF %<>% merge(auspop, by=c("State", "Year"))
## normalise the total homeless population as a proportion of 1000 persons (per mille) in each state population
AUSdataDF %<>% group_by(Year, State) %>% mutate(HomelessProp=1e3L*nHomeless/Population)
## http://www.abs.gov.au/ausstats/subscriber.nsf/log?openagent&1270055001_ste_2011_aust_shape.zip&1270.0.55.001&Data%20Cubes&1D26EC44E6ABD911CA257801000D8779&0&July%202011&23.12.2010&Latest
pacman::p_load(rgdal, maptools, ggplot2)
aus <- readOGR(dsn=".", layer="STE_2011_AUST")
aus@data$id = rownames(aus@data)
aus_map <- tidy(aus, region="id")
aus_map.df = merge(aus_map, aus@data, by="id")
## merge the aus_map data with ours
map_with_data <- AUSdataDF %>% merge(aus_map.df, by.x="State", by.y="STE_NAME11")
## make more than 3x the US national median value as a 'plus' group
map_with_data$HomelessProp[map_with_data$HomelessProp > 4.89] <- 4.89
## save a copy so we don't have to do that again
save(map_with_data, file="map_with_data_aus_2016-14.RData")
## build the animated plot
gg <- ggplot(map_with_data)
gg <- gg + labs(subtitle=paste0("AUS Homeless population scaled by state population,\ncapped at 3x USA national median (4.89/1000)"),
caption="Data: http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/2049.02011?OpenDocument ")
gg <- gg + geom_map(map=aus_map,
aes(x=long, y=lat, map_id=id, fill=HomelessProp, frame=Year),
color="#2b2b2b", size=0.1)
gg <- gg + theme_map()
gg <- gg + scale_fill_gradient2(name="Homeless\npopulation \u2030",
low="steelblue", high="firebrick",
midpoint=4.89/3,
limits=c(0,5),
breaks=c(0,1,2,3,4),
labels=c("0","1","2","3","4+"))
gg <- gg + theme(legend.position=c(0.86, 0.3), legend.key.size=unit(2,"cm"))
gg <- gg + theme(text=element_text(size=30, family="Arial Narrow"))
## view the animation
gg_animate(gg)
## output the animation
gg_animate(gg, interval=1, ani.width=1600, ani.height=1200, file="HomelessPopulation_AUSversion_USscale.gif")
## optimise the gif using Imagemagick
system("convert HomelessPopulation_AUSversion_USscale.gif -fuzz 10% -layers OptimizePlus -layers OptimizeTransparency HomelessPopulation_optim_AUSversion_USscale.gif")
##
## now repeat, but for our own median
##
## merge the aus_map data with ours
map_with_data <- AUSdataDF %>% merge(aus_map.df, by.x="State", by.y="STE_NAME11")
## make more than 3x the US national median value as a 'plus' group
map_with_data$HomelessProp[map_with_data$HomelessProp > 3*median(map_with_data$HomelessProp, na.rm=TRUE)] <- 3*median(map_with_data$HomelessProp, na.rm=TRUE)
## save a copy so we don't have to do that again
save(map_with_data, file="map_with_data_aus2_2016-14.RData")
## build the animated plot
gg <- ggplot(map_with_data)
gg <- gg + labs(subtitle=paste0("AUS Homeless population scaled by state population,\ncapped at 3x AUS national median (10.57/1000)"),
caption="Data: http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/2049.02011?OpenDocument ")
gg <- gg + geom_map(map=aus_map,
aes(x=long, y=lat, map_id=id, fill=HomelessProp, frame=Year),
color="#2b2b2b", size=0.1)
gg <- gg + theme_map()
gg <- gg + scale_fill_gradient2(name="Homeless\npopulation \u2030",
low="steelblue", high="firebrick",
midpoint=median(map_with_data$HomelessProp, na.rm=TRUE),
limits=c(0,12),
breaks=seq(0,10,1),
labels=c("0","1","2","3","4","5","6","7","8","9","10+"))
gg <- gg + theme(legend.position=c(0.86, 0.3), legend.key.size=unit(2,"cm"))
gg <- gg + theme(text=element_text(size=30, family="Arial Narrow"))
## view the animation
gg_animate(gg)
## output the animation
gg_animate(gg, interval=1, ani.width=1600, ani.height=1200, file="HomelessPopulation_AUSversion_AUSscale.gif")
## optimise the gif using Imagemagick
system("convert HomelessPopulation_AUSversion_AUSscale.gif -fuzz 10% -layers OptimizePlus -layers OptimizeTransparency HomelessPopulation_optim_AUSversion_AUSscale.gif")
## for clarity, add a lollipop graph (courtesy of hrbrmstr)
devtools::install_github("hrbrmstr/ggalt")
library(ggalt)
AUSdataDF %<>% mutate(StateYear=paste0(State,", ",Year))
gg2 <- ggplot(AUSdataDF %>% ungroup %>% filter(State!="Northern Territory") %>% arrange(HomelessProp), aes(y=reorder(StateYear, HomelessProp), x=HomelessProp))
gg2 <- gg2 + geom_lollipop(aes(color=factor(Year)), horizontal=TRUE)
gg2 <- gg2 + labs(x="Homeless population \u2030",
y="State, Year",
title="Australian Homeless Population",
subtitle="Excluding Northern Territory",
caption="Data: http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/2049.02011?OpenDocument ")
gg2 <- gg2 + theme_bw()
gg2 <- gg2 + theme(legend.key=element_blank())
gg2 <- gg2 + scale_color_discrete(guide=FALSE)
gg2 <- gg2 + facet_wrap(~Year, nrow=1)
gg2 <- gg2 + theme(text=element_text(family="Arial Narrow"))
gg2
ggsave(gg2, filename="HomelessPopulation_AUSbyStateNONT.png", width=7, height=7, units="in", dpi=600)
gg2NT <- ggplot(AUSdataDF %>% ungroup %>% arrange(HomelessProp), aes(y=reorder(StateYear, HomelessProp), x=HomelessProp))
gg2NT <- gg2NT + geom_lollipop(aes(color=factor(Year)), horizontal=TRUE)
gg2NT <- gg2NT + labs(x="Homeless population \u2030",
y="State, Year",
title="Australian Homeless Population",
# subtitle="Excluding Northern Territory",
caption="Data: http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/2049.02011?OpenDocument ")
gg2NT <- gg2NT + theme_bw()
gg2NT <- gg2NT + theme(legend.key=element_blank())
gg2NT <- gg2NT + scale_color_discrete(guide=FALSE)
gg2NT <- gg2NT + theme(text=element_text(family="Arial Narrow"))
gg2NT
ggsave(gg2NT, filename="HomelessPopulation_AUSbyState.png", width=7, height=7, units="in", dpi=600)

For starters, I compared the Australian statistics on the same scale as the USA (median 1.63‰, capped at 3x that value) and was shocked

HomelessPopulation_optim_AUSversion_USscale

Yep, it appears we’re worse than the USA for homelessness. That sucks. What if we put it back on our own scale, how do our states do relatively? Well, for starters, the median goes up to 3.5‰ (I’ve again capped at 3x that value) but a lot of that seems to be coming from NT. Looking at the data itself, our lowest value is indeed higher than the USA median, so we’ve nothing to be proud of. That said, some states are doing better than our own median. TAS looks to be nicely below, while SA seems to be sitting around the median.

HomelessPopulation_optim_AUSversion_AUSscale

If we drill down to the data itself, we can see what the actual figures look like. I’ve had a go at hrbrmstr’s geom_lollipop (from the dev version of ggalt) and it works nicely, as expected. I’ve left NT off this first graph so that the others stand a fighting chance at the scale.

And here’s what happens if you include the Northern Territory

Ouch. It looks odd, but it’s correct. The number of people in the NT in 2011 was around 231,331 but the census-estimated homeless population was 15,479, which means that 6.7% of the population (i.e. 67‰) were homeless. What? Have I made a mistake? No, it’s just horrifyingly true.

Aw, man. I came here for data analysis, not feels. Clearly this is a national shame, and something needs to be done about it.

To leave a comment for the author, please follow the link and comment on their blog: R – Irregularly Scheduled Programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)