Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The recently announced Revolution Analytics / Zementis partnership goes a long way towards demonstrating how R fits into big-league production environments. A frequent complaint against R is that although R is fine prototyping tool it is not able to handle production environments. Well, that’s just not true. In fact, it is straightforward to build a model in R, translate it into PMML using a standard R library, and then send the PMML file off to Zementis’ ADAPA scoring engine where the model described in the PMML file can be used to score a new data set. Moreover, using Revolution’s RevoDeployR web services technology it is relatively easy to set up the infrastructure where: Revolution R is running on a server somewhere (on site or in the cloud), the ADAPA scoring engine is running on another server and users can access both through a light client, browser or any BI tool.
The following code provides a simple example of splitting a file into training data and testing data, building a simple model and translating it to PMML.
# Load the required R libraries library(pmml); library(XML); # Read in audit data and split into a training file and a testing file auditDF <- read.csv("http://rattle.togaware.com/audit.csv") auditDF <- na.omit(auditDF) # remove NAs to make things easy target <- auditDF$TARGET_Adjusted # Get number of observations N <- length(target); M <- N - 500 i.train <- sample(N,M) # Get a random sample for training audit.train <- auditDF[i.train,] audit.test <- auditDF[-i.train,] # Build a logistic regression model glm.model <- glm(audit.train$TARGET_Adjusted ~ .,data=audit.train,family="binomial") # Describe the model in PMML and save it in an AML file glm.pmml <- pmml(glm.model,name="glm model",data=trainDF) xmlFile <- file.path(getwd(),"audit-glm.xml") saveXML(glm.pmml,xmlFile)
The first few lines of PMML code that gets built should look something like:
<PMML version="3.2" xmlns="http://www.dmg.org/PMML-3_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-3_2 http://www.dmg.org/v3-2/pmml-3-2.xsd">
<Header copyright="Copyright (c) 2011 Joseph" description="Linear Regression Model">
<Extension name="user" value="Joseph" extender="Rattle"/>
<Application name="Rattle/PMML" version="1.2.26"/>
<Timestamp>2011-02-28 14:41:54</Timestamp>
</Header>
<DataDictionary numberOfFields="13">
<DataField name="audit.train$TARGET_Adjusted" optype="continuous" dataType="double"/>
<DataField name="ID" optype="continuous" dataType="double"/>
<DataField name="Age" optype="continuous" dataType="double"/>
<DataField name="Employment" optype="categorical" dataType="string">
<Value value="Consultant"/>
<Value value="Private"/>
<Value value="PSFederal"/>
<Value value="PSLocal"/>
<Value value="PSState"/>
<Value value="SelfEmp"/>
Once the PMML file is built it can be submitted to the ADAPA engine and used to score a new data set.
The interactive demo on the Revolution site pulls all of this together and exercises the key moving parts that would be present in a production level scoring application.
Follow these steps to walk through the demo:
- Click on the link appropriate link in the Example: Audit Data section to download the file audit_scoring.csv to your disk.
- In the Build Predictive model box on the left:
- Select a name for the model
- Choose a Data set (You only have one choice: Audit Data).
- Select a model technique.
- Select the explanatory variables for your model.
- Press the Train Model button
- In the Evaluate Performance box on the right, press the Deploy Model button to have RevoDeployR send the PMML code over to the ADAPA engine.
-
In the CSV Batch Scoring box:
- Select your model.
- Upload the audit_scoring.csv file (or any other file that you may have which would be appropriate for the model you just built)
- Watch for the results.
Revolution Analytics: Using ADAPA & Revolution R Enterprise—Audit Data Demo
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.