24 Days of R: Day 5
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Some time back, I started a project on GitHub wherein I would explore the efficacy of financial literacy efforts in the area where I live. This is done with the support of a local non-profit organization.
As a first step, I tried to draw a picture of the area at a relatively fine level of detail. This relies on the UScensus suite of packages that I wrote about a couple days ago. Today, we'll be looking at data for five counties in North Carolina at the level of a US census tract. First, we'll load up the data and see what levels of homeownership are.
library("UScensus2010") library("UScensus2010tract") durham = county(name = "durham", state = "nc", level = "tract") orange = county(name = "orange", state = "nc", level = "tract") wake = county(name = "wake", state = "nc", level = "tract") johnston = county(name = "johnston", state = "nc", level = "tract") chatham = county(name = "chatham", state = "nc", level = "tract") uwgt = spRbind(orange, durham) uwgt = spRbind(uwgt, wake) uwgt = spRbind(uwgt, johnston) uwgt = spRbind(uwgt, chatham) rm(durham, orange, wake, johnston, chatham)
Whether or not someone owns their home is a strong indicator of economic stability and the potential to retain and accumlate wealth. What percentage of folks own their own home?
# Description of codes can be found in the documentation for the # UScensus2010 package uwgt$TotalPopulation = uwgt$H0030002 uwgt$pctHomeowner = 1 - uwgt$H0040004/uwgt$H0030002 plot(uwgt$pctHomeowner[order(uwgt$pctHomeowner)], pch = 19)
plot(uwgt$TotalPopulation, uwgt$pctHomeowner, pch = 19, xlab = "Total population", ylab = "% Homeownership")
We see that it runs the gamut from zero to 100% homeownership. We might assume that areas of higher population have lower percentages of home ownership. Such areas may be more densely populated and urbanized where people are likely to rent. However, there doesn't appear to be any relationship between the total population and home ownership. The construction of a census tract may have something to do with this.
We'll recreate the choropleth helper function from two days ago so that we can map this data. We'll then draw a map that shows high and low concentrations of homeowners.
library(RColorBrewer) library(classInt) MyChoropleth = function(sp, dem, palette, ...) { df = sp@data brks = classIntervals(df[, dem], n = length(palette), style = "quantile") brks = brks$brks sp$MyColor = palette[findInterval(df[, dem], brks, all.inside = TRUE)] plot(sp, col = sp$MyColor, axes = F, ...) } myPalette = brewer.pal(9, "Blues") MyChoropleth(uwgt, "pctHomeowner", myPalette, border = "transparent")
dfCountyColor = data.frame(county = c("135", "063", "183", "101", "037"), countyName = c("Orange", "Durham", "Wake", "Johnston", "Chatham"), color = c("orange", "blue", "red", "green", "yellow")) uwgt = merge(uwgt, dfCountyColor)
There's a clear geographic distribution at work. In the central part of the map the area between Durham and Raleigh has lower levels of home ownership. These are more urbanized areas, which means they may have more young or transient residents. However, these are also areas of low wealth. We can see this when we load in data from the American Community Survey.
setwd("~/GitHub/FinancialLiteracy/Data/ACS_11_5YR_B17005") dfCensus = read.csv("ACS_11_5YR_B17005.csv", skip = 1) marginOfError = grep("margin", colnames(dfCensus), ignore.case = TRUE) dfCensus = dfCensus[, -marginOfError] rm(marginOfError) colnames(dfCensus) = gsub(".", "", colnames(dfCensus), fixed = TRUE) colnames(dfCensus) = gsub("Estimate", "", colnames(dfCensus), fixed = TRUE) uwgtACS = merge(uwgt, dfCensus, by.x = "fips", by.y = "Id2", all.x = TRUE) uwgtACS$pctNonPoverty = 1 - uwgtACS$Incomeinthepast12monthsbelowpovertylevel/uwgtACS$Total par(mfrow = c(1, 2)) MyChoropleth(uwgtACS, "pctHomeowner", myPalette, border = "transparent") title("% Homeowners") MyChoropleth(uwgtACS, "pctNonPoverty", myPalette, border = "transparent") title("% Above Poverty")
Although there are some exceptions (e.g. folks in the RTP) there's visual evidence of a relationship. We can establish this through a simple linear model.
plot(uwgtACS$pctNonPoverty, uwgtACS$pctHomeowner, pch = 19, xlab = "% Above Poverty", ylab = "% Homeowners") fit = lm(pctHomeowner ~ pctNonPoverty, data = uwgtACS) summary(fit) ## ## Call: ## lm(formula = pctHomeowner ~ pctNonPoverty, data = uwgtACS) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.7773 -0.1106 0.0246 0.1382 0.4664 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.4019 0.0631 -6.37 6.9e-10 *** ## pctNonPoverty 1.1792 0.0715 16.49 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.178 on 310 degrees of freedom ## (1 observation deleted due to missingness) ## Multiple R-squared: 0.467, Adjusted R-squared: 0.466 ## F-statistic: 272 on 1 and 310 DF, p-value: <2e-16 lines(uwgtACS$pctNonPoverty[!is.na(uwgtACS$pctNonPoverty)], predict(fit))
Obviously, there are many other factors at play- marital status, available housing stock, zoning laws, size of family, type of employment- to name but a few. One thing I'd like to explore is the influence of county government on various statistics. Here's the same plot, with sample points color coded by county:
uwgtACS$color = as.character(uwgtACS$color) par(mfrow = c(1, 1)) plot(uwgtACS$pctNonPoverty, uwgtACS$pctHomeowner, pch = 19, xlab = "% Above Poverty", ylab = "% Homeowners", col = uwgtACS$color)
I'll explore that in a later post.
Tomorrow: not sure what I'll write about! Possibly the PISA testing results that were released this week.
citation("UScensus2010tract") ## ## To cite UScensus2000 in publications use: ## ## Zack W. Almquist (2010). US Census Spatial and Demographic Data ## in R: The UScensus2000 Suite of Packages. Journal of Statistical ## Software, 37(6), 1-31. URL http://www.jstatsoft.org/v37/i06/. ## ## A BibTeX entry for LaTeX users is ## ## @Article{, ## title = {US Census Spatial and Demographic Data in {R}: The {UScensus2000} Suite of Packages}, ## author = {Zack W. Almquist}, ## journal = {Journal of Statistical Software}, ## year = {2010}, ## volume = {37}, ## number = {6}, ## pages = {1--31}, ## url = {http://www.jstatsoft.org/v37/i06/}, ## } sessionInfo() ## R version 3.0.2 (2013-09-25) ## Platform: x86_64-w64-mingw32/x64 (64-bit) ## ## locale: ## [1] LC_COLLATE=English_United States.1252 ## [2] LC_CTYPE=English_United States.1252 ## [3] LC_MONETARY=English_United States.1252 ## [4] LC_NUMERIC=C ## [5] LC_TIME=English_United States.1252 ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] knitr_1.4.1 RWordPress_0.2-3 UScensus2010tract_1.00 ## [4] UScensus2010_0.11 foreign_0.8-55 maptools_0.8-27 ## [7] classInt_0.1-21 RColorBrewer_1.0-5 sp_1.0-13 ## ## loaded via a namespace (and not attached): ## [1] class_7.3-9 digest_0.6.3 e1071_1.6-1 evaluate_0.4.7 ## [5] formatR_0.9 grid_3.0.2 lattice_0.20-23 markdown_0.6.3 ## [9] RCurl_1.95-4.1 stringr_0.6.2 tools_3.0.2 XML_3.98-1.1 ## [13] XMLRPC_0.3-0
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.