Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Why 52Vis?
In case folks are wondering why I’m doing this, it’s pretty simple. We need a society that has high data literacy and we need folks who are capable of making awesome, truthful data visualizations. The only way to do that is by working with data over, and over, and over, and over again.
Directed projects with some reward are one of the best Pavlovian ways to accomplish that 🙂
This week’s challenge
The Data is Plural folks have done it again and there’s a neat and important data set in this week’s vis challenge.
From their newsletter:
Every January, at the behest of the U.S. Department of Housing and Urban Development, volunteers across the country attempt to count the homeless in their communities. The result: HUD’s “point in time” estimates, which are currently available for 2007–2015. The most recent estimates found 564,708 homeless people nationwide, with 75,323 of that count (more than 13%) living in New York City.
I decided to take a look at this data by seeing which states had the worst homeless problem per-capita (i.e. per 100K population). I’ve included the population data along with some ready-made wrangling of the HUD data.
But, before we do that…
RULES UPDATE + Last week’s winner
I’ll be announcing the winner on Thursday since I:
- am horribly sick after being exposed to who knows what after rOpenSci last week in SFO 🙂
- have been traveling like mad this week
- need to wrangle all the answers into the github repo and get @laneharrison (and his students) to validate my choice for winner (I have picked a winner)
Given how hard the wrangling has been, I’m going to need to request that folks both leave a blog comment and file a PR to the github repo for this week. Please include the code you used as well as the vis (or a link to a working interactive vis)
PRIZES UPDATE
Not only can I offer Data-Driven Security, but Hadley Wickham has offered signed copies of his books as well, and I’ll keep in the Amazon gift card as a catch-all if you have more (NOTE: if any other authors want to offer up their tomes shoot me a note!).
No place to roam
Be warned: this was a pretty depressing data set. I went in with the question of wanting to know which “states” had the worst problem and I assumed it’d be California or New York. I had no idea it would be what it was and the exercise shattered some assumptions.
NOTE: I’ve included U.S. population data for the necessary time period.
library(readxl) library(purrr) library(dplyr) library(tidyr) library(stringr) library(ggplot2) library(scales) library(grid) library(hrbrmisc) # grab the HUD homeless data URL <- "https://www.hudexchange.info/resources/documents/2007-2015-PIT-Counts-by-CoC.xlsx" fil <- basename(URL) if (!file.exists(fil)) download.file(URL, fil, mode="wb") # turn the excel tabs into a long data.frame yrs <- 2015:2007 names(yrs) <- 1:9 homeless <- map_df(names(yrs), function(i) { df <- suppressWarnings(read_excel(fil, as.numeric(i))) df[,3:ncol(df)] <- suppressWarnings(lapply(df[,3:ncol(df)], as.numeric)) new_names <- tolower(make.names(colnames(df))) new_names <- str_replace_all(new_names, "\.+", "_") df <- setNames(df, str_replace_all(new_names, "_[[:digit:]]+$", "")) bind_cols(df, data_frame(year=rep(yrs[i], nrow(df)))) }) # clean it up a bit homeless <- mutate(homeless, state=str_match(coc_number, "^([[:alpha:]]{2})")[,2], coc_name=str_replace(coc_name, " CoC$", "")) homeless <- select(homeless, year, state, everything()) homeless <- filter(homeless, !is.na(state)) # read in the us population data uspop <- read.csv("uspop.csv", stringsAsFactors=FALSE) uspop_long <- gather(uspop, year, population, -name, -iso_3166_2) uspop_long$year <- sub("X", "", uspop_long$year) # normalize the values states <- count(homeless, year, state, wt=total_homeless) states <- left_join(states, albersusa::usa_composite()@data[,3:4], by=c("state"="iso_3166_2")) states <- ungroup(filter(states, !is.na(name))) states$year <- as.character(states$year) states <- mutate(left_join(states, uspop_long), homeless_per_100k=(n/population)*100000) # we want to order from worst to best group_by(states, name) %>% summarise(mean=mean(homeless_per_100k, na.rm=TRUE)) %>% arrange(desc(mean)) -> ordr states$year <- factor(states$year, levels=as.character(2006:2016)) states$name <- factor(states$name, levels=ordr$name) # plot #+ fig.retina=2, fig.width=10, fig.height=15 gg <- ggplot(states, aes(x=year, y=homeless_per_100k)) gg <- gg + geom_segment(aes(xend=year, yend=0), size=0.33) gg <- gg + geom_point(size=0.5) gg <- gg + scale_x_discrete(expand=c(0,0), breaks=seq(2007, 2015, length.out=5), labels=c("2007", "", "2011", "", "2015"), drop=FALSE) gg <- gg + scale_y_continuous(expand=c(0,0), labels=comma, limits=c(0,1400)) gg <- gg + labs(x=NULL, y=NULL, title="US Department of Housing & Urban Development (HUD) Total (Estimated) Homeless Population", subtitle="Counts aggregated from HUD Communities of Care Regional Surveys (normalized per 100K population)", caption="Data from: https://www.hudexchange.info/resource/4832/2015-ahar-part-1-pit-estimates-of-homelessness/") gg <- gg + facet_wrap(~name, scales="free", ncol=6) gg <- gg + theme_hrbrmstr_an(grid="Y", axis="", strip_text_size=9) gg <- gg + theme(axis.text.x=element_text(size=8)) gg <- gg + theme(axis.text.y=element_text(size=7)) gg <- gg + theme(panel.margin=unit(c(10, 10), "pt")) gg <- gg + theme(panel.background=element_rect(color="#97cbdc44", fill="#97cbdc44")) gg <- gg + theme(plot.margin=margin(10, 20, 10, 15)) gg |
I used one of HUD’s alternate, official color palette colors for the panel backgrounds.
Remember, this is language/tool-agnostic & go in with a good question or two, augment as you feel you need to and show us your vis!
Week 2’s content closes 2016-04-12 23:59 EDT
Contest GitHub Repo: https://github.com/52vis/2016-14
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.