Announcing Revolution R Enterprise 5.0

Posted on November 15, 2011 by David Smith in R bloggers | 0 Comments

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We're proud to announce the latest update to the enhanced, commercial-grade distribution of R, Revolution R Enterprise 5.0. With each new release, Revolution R Enterprise adds more capabilities to open-source R, to make R users more productive, to improve performance of R programs, to support Big Data analytics, and to provide servers and APIs for enterprise deployment.

New features in Revolution R Enterprise 5.0 include::

Distributed/Parallel Computing: Automatically distribute statistical analyses from a desktop across nodes of a cluster through Windows HPC server and distribute R function calls across nodes.
Scalable Data Management: Increase flexibility in data analysis with new data import and cleaning/manipulation tools.
Integration with Hadoop: Support MapReduce programming in R and integration with HDFS and HBASE with Cloudera Certified Technology.
Expanded Scalable Analytics Functionality: Apply new big data statistics algorithms including principal components analysis, factor analysis, contingency table analysis and more.
Enhanced R Productivity Environment: Create and build R packages with expanded support features.
Enhanced RevoDeployR server: Add multiple compute nodes to support more users, batch execution of large analysis jobs, and LDAP enterprise security support.
Upgraded Open Source R: Revolution R 5.0 includes the fully-patched R 2.13.2, which features a new byte-compiler to improve performance of user-written functions and packages.

We're particularly excited about the new capabilities to do parallel programming and statistical analysis on a HPC Server cluster. Here's a quick overview and an example of using a 5-node cluster to do a billion-row regression in less than a minute:

The detailed list of new features is below (after the jump), and you can find more about Revolution R Enterprise 5.0 at the link below. Existing subscribers will be notified with download instructions for the update in the next couple of days, and Revolution R Enterprise 5.0 is (as always) available free of charge to academic users.

Revolution Analytics: Revolution R Enterprise 5.0 overview

What’s New in Revolution R Enterprise 5.0

Distributed/parallel computing

Automatically distribute statistical analyses from your desktop across nodes of a cluster [Currently supported for Windows HPC Server]. Analyses include summary statistics, crosstabs, linear regression, logistic regression, covariance matrix computations for factor analysis and principal components, and k-means clustering. Binning computations for histograms are also distributed.
Distribute R function calls, including data manipulation functions, across nodes. Easily distribute “embarassingly parallel” computations across nodes or cores of a [Microsoft HPC] cluster, or the cores of your desktop or laptop using the new rxExec function.
Compute in parallel with foreach using RevoScaleR using the new doRSR backend.
New RevoScaleR Distributed Computing Guide (choose Help/R Manuals (PDF) in the RPE).

Scalable data management : Data Import

New versatile rxImport function for using external data with R (delimited and fixed-format text, SAS, SPSS, or ODBC). Bring smaller data sets directly into an R data frame; store larger data sets in the native .xdf file format, very efficient for storing and accessing large data sets. The rxImport function returns a data frame or an RxXdfData object representing the created .xdf file. Either can be used in subsequent data analysis functions.
Two alternative modes of Delimited Text import, and two alternative modes of ODBC import – one supported on Linux
Ability to keep or drop variables on import
Ability to specify start row and number of rows of data to import

Scalable data management : Data Cleaning and Manipulation

New versatile rxDataStep function allows you to perform data transformations on big data using the power and flexibility of the R language. Experiment with a small data frame, then apply the same code to a huge data set.
- Returns data frame or RxXdfData object representing an .xdf file that can be used in subsequent scalable analyses.
- Works with data frames or .xdf files (as input data or output), making it easy to convert from one type to another.
- Ability to “re-block” xdf files with a user-specified number of rows.
- Improved evaluation environments for user-defined transforms and transform functions, and new internal variable, .rxNumRows (containing the number of rows in the current block) for use within transformations.
Big data merge with the new rxMerge function. Merge two large data files, or merge a smaller in-memory data set into a large data file.
Improved performance for big data sort. New general rxSort function to work on data frames or .xdf file
Ability to create and recode factors in .xdf files and data frames using new rxFactors function
Split an .xdf file into multiple files by number of rows, blocks, or levels of a factor variable using new rxSplitXdf function.
Support for additional data types in .xdf files: ordered factors and POSIXct, and improved support for Date data type.
New functions rxGetVarInfo, rxGetInfo, and rxSetVarInfo work for both data frames and xdf files
New examples in the RevoScaleR User’s Guide for big data data step and import.

Expanded scalable statistical functionality

New functions utilizing output from rxCrossTabs objects:
- rxChiSquaredTest: Chi-squared Test
- rxFisherTest: Fisher's Exact Test
- rxKendallCor: Kendall's Tau Rank Correlation Coefficient
- rxPairwiseCrossTab: Apply a function 'FUN' to all pairwise combinations of the rows and columns of an xtabs object, stratifying by higher dimensions
- rxRiskRatio: Calculate the relative risk ratio on a two-by-two table
- rxOddsRatio: Calculate the relative odds ratio on a two-by-two table
- rxMultiTest: Collects a list of tests for variable independence into a table.
Also a new rxResultsDF method for rxCrossTabs, rxSummary, and rxLinMod for extracting a data frame from results objects
Improved performance for scalable analysis functions operating on data frames.
Option in rxPredict and rxKmeans to write out model variables in addition to predictions/cluster number.
Option in rxSummary to remove missing values by term.
Option in rxLinMod and rxLogit to drop first or last factor levels, and ability to set starting parameter values in rxLogit.
rxHistogram now supports logical data and frequency weights with continuous data, and has transforms and related arguments.
New examples in the RevoScaleR User’s Guide for factor analysis and principal components analysis.

Enhanced RPE

Support for creating and building R packages:
- R Package Project type in RPE Solution Explorer to create the directory structure for a new R package.
- Create an Rd Help file template for a user-created function form the Solution Explorer by adding a new item and specifying the function name.
- Build an R package from the Solution Explorer.
Support for Windows HPC Server:
- Access the HPC job scheduler directly from the Windows R Productivity Environment (RPE).
- View the status of pending jobs in the RPE Object Browser.
- Code snippets for distributed computing with HPC Server
Option to load last-loaded solution on startup
New projects now starts by default in release mode instead of debug mode

Update to Open Source R 2.13.2

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Announcing Revolution R Enterprise 5.0

What’s New in Revolution R Enterprise 5.0

Distributed/parallel computing

Scalable data management : Data Import

Scalable data management : Data Cleaning and Manipulation

Expanded scalable statistical functionality

Enhanced RPE

Update to Open Source R 2.13.2

Related

What’s New in Revolution R Enterprise 5.0

Distributed/parallel computing

Scalable data management : Data Import

Scalable data management : Data Cleaning and Manipulation

Expanded scalable statistical functionality

Enhanced RPE

Update to Open Source R 2.13.2

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)