Data Mining the California Solar Statistics with R: Part V
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Building a Shiny App to explore the model and the data
About the Shiny App
In my previous post I built several models to try to predict the amount of residential solar installed per county by quarter as a function of solar insolation, price of solar electricity, county population and county median income. To explore the data and model predictions I’ve build a Shiny app in R-Studio. I could not install R-studio using the hosting service that I have now so if you want to check it out, you’ll need to head over to
https://beyondmaxwell.shinyapps.io/CaSolar
The app allows you to look at the actual installations vs. predicted installation by year and quarter. Additionally, I included a bar plot of the same data as the map, which makes it a bit easier to see. Also, I created an interactive scatter plot where you can see the effect of the different predictors on the total installed residential solar by county. Rather than post the code here for the Shiny App, the code is hosted on my Github at
https://github.com/JohnRenshaw/CaSolarShinyApp
Examining the effect of solar subsidies on the total amount of residential solar installed from 2009-2013
In the previous posts, I have been using the subsidized cost of solar installs as a predictor, but now I would like to predict how much residential installed solar would have occurred if no CA subsidies were given. To do this I first need to create a variable for the actual up front cost of solar paid.
require(plyr) require(randomForest) ##load previous data load("CaSolarIV.RData") ##create variable for cost/watt actualCostByQuarter = ddply(solarData, .(year,quarter),summarise, cost=mean(Total.Cost/(solarData$CEC.PTC.Rating*1000),na.rm=TRUE)) ##merge with data set colnames(installsByYearCountyQuarter)[8]='subsidCost' installsByYearCountyQuarter=merge(installsByYearCountyQuarter,actualCostByQuarter,c("year","quarter"))
Now that I have the subsidy free cost as a predictor, I can use the random forest model to predict how much residential solar would have been installed if there had been no subsidies.
noSubPreds = round(sum(predict(solarForest, newdata = installsByYearCountyQuarter))) ## predictions with no subsidies SubPreds = round(sum(solarForest$predicted)) ## predictions with subsidies actual = round(sum(installsByYearCountyQuarter$Total.kW)) ## actual residential installs from 2009-2013 TotalSolarInstalled = c(noSubPreds,SubPreds,actual) type = c('Without CA subsidies (predicted)','With CA subsidies (predicted)','With CA subsidies (actual)') finale = data.frame(TotalSolarInstalled,type) require(ggplot2) ggplot(finale,aes(type,TotalSolarInstalled))+geom_bar(stat="identity")+theme_bw()+ylab('Total residential solar installed from 2009-2013 (kW)') + geom_text(aes(label = TotalSolarInstalled), vjust=-0.5, position = position_dodge(0.9), size = 4) ##add labels
This bar chart really surprised me. The random forest model predicts that if no CA subsidies existed then ~ 25% less residential solar would have been installed between 2009 and 2013. That is quite a dramatic effect. Judging by my analysis, the Go Solar California program has helped California take a big step towards a future that is less dependent on fossil fuels.
Well, I think that about wraps things up for this project. If you have any ideas for analysis I didn’t think of or have any comments or questions about what I’ve done please don’t hesitate to reach out to me.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.