Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve done a previous post about the salaries of data scientists, but now I’m going to look at one of the negative sides of all the high salaries generated by the tech field in the Bay Area – real estate prices.
A cursory look at San Francisco real estate prices convinced me that my best options for affordable housing lay elsewhere. I checked into the South Bay and found that prices there were not much better. Luckily, housing prices on the East Bay (while not really reasonable) are at least significantly cheaper than anything found in SF or South Bay. I started zeroing in on two locations – San Leandro and Hayward. A friendly broker agreed to send me some data on recent sales in both areas. What follows will be a brief exploratory data analysis of recent housing sales in San Leandro/Hayward.
First, I start just by creating boxplots of sale prices by bedroom/bath:
In Hayward we can see that 3 bedroom houses cost approximately $50K – $75K more than 2 bedroom houses in the same area. Interestingly, 4 bedroom houses were generally cheaper than 3 bedroom houses. I will look into this more later, but it likely due to earlier construction or less desirable immediate neighborhoods. It is also worth noting that there are almost no price differences between 2BD,1.5BA and 2BD,2BA (same goes for 3BD,1.5BA and 3BD,2BA). People don’t seem to place much value on the difference between 1.5 baths and 2 baths.
Finally, it is fairly obvious that there is more range in house sale prices in Hayward (compared to San Leandro). There is a lot more analysis to do, but this is a good start for now! Here’s the R code:
###### Settings options(scipen=10) setwd("C:/Blog/SFHousing") ###### Loading data sl<-read.csv("SanLeandro.csv") hay<-read.csv("Hayward.csv") ###### Formatting data sl$Sold.Price<-as.numeric(gsub('[[:punct:]]','',sl$Sold.Price)) sl$List.Price<-as.numeric(gsub('[[:punct:]]','',sl$List.Price)) hay$Sold.Price<-as.numeric(gsub('[[:punct:]]','',hay$Sold.Price)) hay$List.Price<-as.numeric(gsub('[[:punct:]]','',hay$List.Price)) sl$Baths.Partial[is.na(sl$Baths.Partial)]<-0 sl$Baths2<-sl$Baths+sl$Baths.Partial*.5 sl<-sl[order(sl$Bedrooms,sl$Baths2),] sl$Title<-paste0(sl$Bedrooms,"BD,",sl$Baths2,"BA") # Remove house types only listed once sl<-sl[sl$Title %in% names(table(sl$Title))[as.numeric(which(table(sl$Title)>1))],] sllev<-unique(sl$Title) sl$Title<-factor(sl$Title,levels=sllev) hay$Baths.Partial[is.na(hay$Baths.Partial)]<-0 hay$Baths2<-hay$Baths+hay$Baths.Partial*.5 hay<-hay[order(hay$Bedrooms,hay$Baths2),] hay$Title<-paste0(hay$Bedrooms,"BD,",hay$Baths2,"BA") # Remove house types only listed once hay<-hay[hay$Title %in% names(table(hay$Title))[as.numeric(which(table(hay$Title)>1))],] haylev<-unique(hay$Title) hay$Title<-factor(hay$Title,levels=haylev) minmin<-floor(min(sl$Sold.Price)/50000)*50000 maxmax<-ceiling(max(sl$Sold.Price)/50000)*50000 par(mar=c(6,5,5,5)) boxplot(sl$Sold.Price~sl$Title,main="San Leandro - Sold Price",col="skyblue",ylim=c(minmin,maxmax), yaxt="n") axis(2,at=seq(minmin,maxmax,by=50000),labels=paste0("$",prettyNum(seq(minmin,maxmax,by=50000),big.mark=",")),las=2) axis(4,at=seq(minmin,maxmax,by=50000),labels=paste0("$",prettyNum(seq(minmin,maxmax,by=50000),big.mark=",")),las=2) for (i in seq(minmin,maxmax,by=25000)) {abline(h=i,lty=3,col="lightgray")} minmin2<-floor(min(hay$Sold.Price)/50000)*50000 maxmax2<-ceiling(max(hay$Sold.Price)/50000)*50000 par(mar=c(6,5,5,5)) boxplot(hay$Sold.Price~hay$Title,main="Hayward - Sold Price",col="lightgreen",ylim=c(minmin,maxmax), yaxt="n") axis(2,at=seq(minmin,maxmax2,by=50000),labels=paste0("$",prettyNum(seq(minmin2,maxmax2,by=50000),big.mark=",")),las=2) axis(4,at=seq(minmin,maxmax2,by=50000),labels=paste0("$",prettyNum(seq(minmin2,maxmax2,by=50000),big.mark=",")),las=2) for (i in seq(minmin,maxmax2,by=25000)) {abline(h=i,lty=3,col="lightgray")}
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.