Site icon R-bloggers

Big-data Naive Bayes and Classification Trees with R and Netezza

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The IBM Netezza analytics appliances combine high-capacity storage for Big Data with a massively-parallel processing platform for high-performance computing. With the addition of Revolution R Enterprise for IBM Netezza, you can use the power of the R language to build predictive models on Big Data.

In the demonstration below, Revolution Analytics' Derek Norton analyzes loan approval data stored on the IBM appliance. You'll see the R code used to:

  • Explore the raw data (with summary statistics and charts)
  • Prepare the data for statistical analysis, and create training and test sets
  • Create predictive models using classificiation trees and Naïve Bayes
  • Predict using the models, and evaluate model performance using confusion matrices

< embed allowfullscreen="true" allowscriptaccess="always" height="254" src="https://www.youtube.com/v/EO4EOBZ-t6w?version=3&hl=en_US" type="application/x-shockwave-flash" width="450">  

Note that while R code is being run on Derek's laptop, the raw data is never moved from the appliance, and the analytic computations take place "in-database" within the appliance itself (where the Revolution R Enterprise engine is also running on each parallel core). 

This demo was included in the recent webinar, Turbo-Charge Your Analytics with IBM Netezza for which you can find slides and a replay at the link below.

Revolution Anlaytics Webinars: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.