Become an effective data hacker with the R-Hadoop stack
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In discussion with several data scientists, Will Stanton (a data scientist with Return Path) learned that a common concern is: what software should I be using? There are many options out there, but what is the best platform to be an effective “data hacker”?
Will recommends using a technology stack with R and Hadoop, which allows data scientists “to do almost anything you need to for data hacking”. With this platform, you have all the tools you need for:
- Statistical Programming
- Machine Learning
- Visualization
- Reporting / Dashboarding
- Databases
- Big Data
- Data Munging
On the other hand, Will says the stack works best on Unix or Linux based systems (Windows is possible, but tricky), and isn't ideally suited for text mining or web-based applicatons. But if this is something you want to try, a good start is the RHadoop project, a collection of R packages that connect R and Hadoop.
For more on being a data hacker with R-Hadoop stack, check out Will's complete blog post linked below.
Will Stanton's Data Science blog: Becoming a data “hacker” (via Joaquim Coll)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.