Site icon R-bloggers

Announcing Revolution R Enterprise 5.0

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We're proud to announce the latest update to the enhanced, commercial-grade distribution of R, Revolution R Enterprise 5.0. With each new release, Revolution R Enterprise adds more capabilities to open-source R, to make R users more productive, to improve performance of R programs, to support Big Data analytics, and to provide servers and APIs for enterprise deployment

New features in Revolution R Enterprise 5.0 include::

  • Distributed/Parallel Computing: Automatically distribute statistical analyses from a desktop across nodes of a cluster through Windows HPC server and distribute R function calls across nodes. 
  • Scalable Data Management: Increase flexibility in data analysis with new data import and cleaning/manipulation tools.
  • Integration with Hadoop: Support MapReduce programming in R and integration with HDFS and HBASE with Cloudera Certified Technology.
  • Expanded Scalable Analytics Functionality: Apply new big data statistics algorithms including principal components analysis, factor analysis, contingency table analysis and more.
  • Enhanced R Productivity Environment: Create and build R packages with expanded support features.
  • Enhanced RevoDeployR server: Add multiple compute nodes to support more users, batch execution of large analysis jobs, and LDAP enterprise security support.
  • Upgraded Open Source R: Revolution R 5.0 includes the fully-patched R 2.13.2, which features a new byte-compiler to improve performance of user-written functions and packages.

We're particularly excited about the new capabilities to do parallel programming and statistical analysis on a HPC Server cluster. Here's a quick overview and an example of using a 5-node cluster to do a billion-row regression in less than a minute:

< embed allowfullscreen="true" allowscriptaccess="always" height="284" src="https://www.youtube.com/v/yHpW1v97A3M?version=3&hl=en_US&hd=1" type="application/x-shockwave-flash" width="450">  

The detailed list of new features is below (after the jump), and you can find more about Revolution R Enterprise 5.0 at the link below. Existing subscribers will be notified with download instructions for the update in the next couple of days, and Revolution R Enterprise 5.0 is (as always) available free of charge to academic users.

Revolution Analytics: Revolution R Enterprise 5.0 overview 

What’s New in Revolution R Enterprise 5.0

Distributed/parallel computing

  • Automatically distribute statistical analyses from your desktop across nodes of a cluster [Currently supported for Windows HPC Server]. Analyses include summary statistics, crosstabs, linear regression, logistic regression, covariance matrix computations for factor analysis and principal components, and k-means clustering. Binning computations for histograms are also distributed.
  • Distribute R function calls, including data manipulation functions, across nodes. Easily distribute “embarassingly parallel” computations across nodes or cores of a [Microsoft HPC] cluster, or the cores of your desktop or laptop using the new rxExec function.
  • Compute in parallel with foreach using RevoScaleR using the new doRSR backend.
  • New RevoScaleR Distributed Computing Guide (choose Help/R Manuals (PDF) in the RPE).

Scalable data management : Data Import

  • New versatile rxImport function for using external data with R (delimited and fixed-format text, SAS, SPSS, or ODBC). Bring smaller data sets directly into an R data frame; store larger data sets in the native .xdf file format, very efficient for storing and accessing large data sets. The rxImport function returns a data frame or an RxXdfData object representing the created .xdf file. Either can be used in subsequent data analysis functions.
  • Two alternative modes of Delimited Text import, and two alternative modes of ODBC import – one supported on Linux
  • Ability to keep or drop variables on import
  • Ability to specify start row and number of rows of data to import

Scalable data management : Data Cleaning and Manipulation

  • New versatile rxDataStep function allows you to perform data transformations on big data using the power and flexibility of the R language. Experiment with a small data frame, then apply the same code to a huge data set.
    • Returns data frame or RxXdfData object representing an .xdf file that can be used in subsequent scalable analyses.
    • Works with data frames or .xdf files (as input data or output), making it easy to convert from one type to another.
    • Ability to “re-block” xdf files with a user-specified number of rows.
    • Improved evaluation environments for user-defined transforms and transform functions, and new internal variable, .rxNumRows (containing the number of rows in the current block) for use within transformations.
  • Big data merge with the new rxMerge function. Merge two large data files, or merge a smaller in-memory data set into a large data file.
  • Improved performance for big data sort. New general rxSort function to work on data frames or .xdf file
  • Ability to create and recode factors in .xdf files and data frames using new rxFactors function
  • Split an .xdf file into multiple files by number of rows, blocks, or levels of a factor variable using new rxSplitXdf function.
  • Support for additional data types in .xdf files: ordered factors and POSIXct, and improved support for Date data type.
  • New functions rxGetVarInfo, rxGetInfo, and rxSetVarInfo work for both data frames and xdf files
  • New examples in the RevoScaleR User’s Guide for big data data step and import.

Expanded scalable statistical functionality

  • New functions utilizing output from rxCrossTabs objects:
    • rxChiSquaredTest: Chi-squared Test
    • rxFisherTest: Fisher's Exact Test
    • rxKendallCor: Kendall's Tau Rank Correlation Coefficient
    • rxPairwiseCrossTab: Apply a function 'FUN' to all pairwise combinations of the rows and columns of an xtabs object, stratifying by higher dimensions
    • rxRiskRatio: Calculate the relative risk ratio on a two-by-two table
    • rxOddsRatio: Calculate the relative odds ratio on a two-by-two table
    • rxMultiTest: Collects a list of tests for variable independence into a table.
  • Also a new rxResultsDF method for rxCrossTabs, rxSummary, and rxLinMod for extracting a data frame from results objects
  • Improved performance for scalable analysis functions operating on data frames.
  • Option in rxPredict and rxKmeans to write out model variables in addition to predictions/cluster number.
  • Option in rxSummary to remove missing values by term.
  • Option in rxLinMod and rxLogit to drop first or last factor levels, and ability to set starting parameter values in rxLogit.
  • rxHistogram now supports logical data and frequency weights with continuous data, and has transforms and related arguments.
  • New examples in the RevoScaleR User’s Guide for factor analysis and principal components analysis.

Enhanced RPE

  • Support for creating and building R packages:
    • R Package Project type in RPE Solution Explorer to create the directory structure for a new R package.
    • Create an Rd Help file template for a user-created function form the Solution Explorer by adding a new item and specifying the function name.
    • Build an R package from the Solution Explorer.
  • Support for Windows HPC Server:
    • Access the HPC job scheduler directly from the Windows R Productivity Environment (RPE).
    • View the status of pending jobs in the RPE Object Browser.
    • Code snippets for distributed computing with HPC Server
  • Option to load last-loaded solution on startup
  • New projects now starts by default in release mode instead of debug mode

Update to Open Source R 2.13.2 

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.