What Data Science can learn from small-data Statistics

David Smith

8 years ago

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last month I joined Gregory Piatetsky (KDnuggets editor) for a webinar presentation Data Science: Not Just for Big Data, hosted by Kalido. In my portion of the prentation (you can see my slides below), I wanted to react to the Big Data focus which is so much a part of the Data Science movement today, to focus on the issues that with all data sets, that statisticians have learned from working with smaller data sets over the last 200 years. This includes issues like observational bias (an often-overlooked issue with Big Data), confounding and overfitting (which can mess up any model, if care isn't taken), and to move the discussion around predictions (means) and towards risk (variance).

I still firmly believe that Big Data is important — there's so much we can do today that was never possible without the variety and volume of data sources we have now — but the data science community has much to learn from the realm of smaller data. Serveral examples come from the excellent ComputerWorld article, 12 predictive analytics screw-ups. You can watch the webinar replay below.

Kalido Webinars: Data Science: Not Just For Big Data

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.