Varian on big data
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Last week my research group discussed Hal Varian’s interesting new paper on “Big data: new tricks for econometrics”, Journal of Economic Perspectives, 28(2): 3–28.
It’s a nice introduction to trees, bagging and forests, plus a very brief entree to the LASSO and the elastic net, and to slab and spike regression. Not enough to be able to use them, but ok if you’ve no idea what they are.
It was more disappointing on boosting (completely ignoring the fact that boosting can be applied in a regression context as well as a classification context), and his comments on causality seemed curiously naive. His suggested approach involved forecasting using all variables but the one that is considered causal, and then comparing the results against what actually happened. That seems at least as likely to lead to false conclusions on causality as instrumental variables or differences-in-differences. Although Varian cites Pearl’s work approvingly, I doubt that Pearl would return the favour.
On a positive note, his Bayesian Structural Time Series model (which I heard him speak about in Rome 12 months ago) seems interesting and very useful. I wonder when the promised R package will appear?
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.