Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve done a previous post about the salaries of data scientists, but now I’m going to look at one of the negative sides of all the high salaries generated by the tech field in the Bay Area – real estate prices. A cursory look at San Francisco real estate prices convinced me that my best options for affordable housing lay elsewhere. I checked into the South Bay and found that prices there were not much better. Luckily, housing prices on the East Bay (while not really reasonable) are at least significantly cheaper than anything found in SF or South Bay. I started zeroing in on two locations – San Leandro and Hayward. A friendly broker agreed to send me some data on recent sales in both areas. What follows will be a brief exploratory data analysis of recent housing sales in San Leandro/Hayward. First, I start just by creating boxplots of sale prices by bedroom/bath:
###### Settings options(scipen=10) setwd("C:/Blog/SFHousing") ###### Loading data sl<-read.csv("SanLeandro.csv") hay<-read.csv("Hayward.csv") ###### Formatting data sl$Sold.Price<-as.numeric(gsub('[[:punct:]]','',sl$Sold.Price)) sl$List.Price<-as.numeric(gsub('[[:punct:]]','',sl$List.Price)) hay$Sold.Price<-as.numeric(gsub('[[:punct:]]','',hay$Sold.Price)) hay$List.Price<-as.numeric(gsub('[[:punct:]]','',hay$List.Price)) sl$Baths.Partial[is.na(sl$Baths.Partial)]<-0 sl$Baths2<-sl$Baths+sl$Baths.Partial*.5 sl<-sl[order(sl$Bedrooms,sl$Baths2),] sl$Title<-paste0(sl$Bedrooms,"BD,",sl$Baths2,"BA") # Remove house types only listed once sl<-sl[sl$Title %in% names(table(sl$Title))[as.numeric(which(table(sl$Title)>1))],] sllev<-unique(sl$Title) sl$Title<-factor(sl$Title,levels=sllev) hay$Baths.Partial[is.na(hay$Baths.Partial)]<-0 hay$Baths2<-hay$Baths+hay$Baths.Partial*.5 hay<-hay[order(hay$Bedrooms,hay$Baths2),] hay$Title<-paste0(hay$Bedrooms,"BD,",hay$Baths2,"BA") # Remove house types only listed once hay<-hay[hay$Title %in% names(table(hay$Title))[as.numeric(which(table(hay$Title)>1))],] haylev<-unique(hay$Title) hay$Title<-factor(hay$Title,levels=haylev) minmin<-floor(min(sl$Sold.Price)/50000)*50000 maxmax<-ceiling(max(sl$Sold.Price)/50000)*50000 par(mar=c(6,5,5,5)) boxplot(sl$Sold.Price~sl$Title,main="San Leandro - Sold Price",col="skyblue",ylim=c(minmin,maxmax), yaxt="n") axis(2,at=seq(minmin,maxmax,by=50000),labels=paste0("$",prettyNum(seq(minmin,maxmax,by=50000),big.mark=",")),las=2) axis(4,at=seq(minmin,maxmax,by=50000),labels=paste0("$",prettyNum(seq(minmin,maxmax,by=50000),big.mark=",")),las=2) for (i in seq(minmin,maxmax,by=25000)) {abline(h=i,lty=3,col="lightgray")} minmin2<-floor(min(hay$Sold.Price)/50000)*50000 maxmax2<-ceiling(max(hay$Sold.Price)/50000)*50000 par(mar=c(6,5,5,5)) boxplot(hay$Sold.Price~hay$Title,main="Hayward - Sold Price",col="lightgreen",ylim=c(minmin,maxmax), yaxt="n") axis(2,at=seq(minmin,maxmax2,by=50000),labels=paste0("$",prettyNum(seq(minmin2,maxmax2,by=50000),big.mark=",")),las=2) axis(4,at=seq(minmin,maxmax2,by=50000),labels=paste0("$",prettyNum(seq(minmin2,maxmax2,by=50000),big.mark=",")),las=2) for (i in seq(minmin,maxmax2,by=25000)) {abline(h=i,lty=3,col="lightgray")}
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.