Saving R Objects in Oracle Database using Oracle R Enterprise 1.3 Datastore
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R allows users to save R objects to disk. The whole workspace of R objects can be saved to a file, and reloaded across R sessions, which allows users to return to their previous R environment even after quitting R or to avoid recreating objects needed in the future. One such type of object includes predictive models, which can be built in one session, and saved for scoring in another R session, or even multiple, possibly parallel R sessions.
R provides the save and load functions where objects in memory are serialized and unserialized, respectively. Figure 1 depicts an example where two R objects, a linear model and data.frame are saved to a file, and then reloaded. When objects are restored, they have the same names as when they were saved.
Figure 1: Using R save() and load() functions
Oracle R Enterprise (ORE) 1.3 supports object persistence using an R datastore in the database. Now, ORE proxy objects, as well as any R objects, can be saved and restored across R sessions as a named entity in Oracle Database. Serializing ORE objects, such as ore.frames, and saving them doesn’t work across sessions, since any referenced temporary tables or other database objects are not saved across R sessions. If these ore.frame proxy object references are not maintained, restoring them makes them incomplete and inoperative.
Figure 2 has an example similar to the previous example. The main difference is that we are using ore.save and ore.load, and providing the name of the datastore from which we want to retrieve ORE objects.
Figure 2: Using ORE datastore functions ore.save and ore.load
Each schema has its own datastore table where R objects are saved. By being managed in Oracle Database, ORE provides referential integrity of saved objects such that when otherwise temporary database objects are no longer referenced, they are auto-deleted at the end of the R session. This applies, for example, to tables created via ore.push or Oracle Data Mining models produced using the OREdm package.
Here’s a simple example:
DAT1 <- ore.push(iris)
ore.lm.mod <- ore.lm(Sepal.Length~., DAT1 )
lm.mod <- lm(mpg ~ cyl + disp + hp + wt + gear, mtcars)
nb.mod <- ore.odmNB(YEAR ~ ARRDELAY + DEPDELAY + log(DISTANCE), ONTIME_S)
ore.save(ore.lm.mod, lm.mod, nb.mod, name = “myModels”)
We’re creating four objects: an ore.frame “DAT1” consisting of the iris data set, an ore.lm model that uses DAT1,a standard R lm model using the mtcars dataset, and an ODM naïve Bayes model using ONTIME_S. We then invoke ore.save for the three models and use the datastore name “myModels”.
The R objects are saved in the datastore, and any referenced data tables or ODM models are kept in the database, otherwise these are treated as temporary database objects and dropped when the ORE session ends.
To load these objects, invoke:
ore.load(name = “myModels”)
To see the content of a datastore, you can invoke ore.datastoreSummary with the name of the datastore.
The datastore also makes it easy to access R and ORE objects within ORE embedded R execution functions. Simply, the name of the datastore is
passed as an embedded R function argument. Within the function, ore.load is invoked with the name of
that datastore. We’ll see an example of this using the SQL API below. In
addition, by maintaining persisted R objects in the database, ORE
facilitates application deployment while leveraging existing database
backup, recovery, and security mechanisms.
Consider the following ORE embedded R execution example using the SQL API. We build a simple linear model and store it in a datastore with the name “myDatastore”. This R script will be stored in the database R script repository with the name “BuildModel-1”. To invoke this R script, we use the rqTableEval function, providing the input data from ONTIME_S, passing the parameters that include the name of the datastore, requesting XML output for the result, and specifying the R script by name.
begin
sys.rqScriptCreate(‘BuildModel-1’,
‘function(dat,datastore_name) {
mod <- lm(ARRDELAY ~ DISTANCE + DEPDELAY, dat)
ore.save(mod,name=datastore_name, overwrite=TRUE)
TRUE
}’);
end;
/
select *
from table(rqTableEval(
cursor(select ARRDELAY,DISTANCE,DEPDELAY from ONTIME_S),
cursor(select 1 as “ore.connect”, ‘myDatastore’ as “datastore_name”
from dual),
‘XML’,
‘BuildModel-1’));
To score using this model, we create an R script named “ScoreData-1” that loads the model from the named datastore, invokes predict, and binds the predictions with the predictors to be returned as the result.To invoke this script, we again use the rqTableEval function that takes as parameters a cursor specifying the data to be scored, parameters, a description of the result as a SQL query, and the R script name.
begin
sys.rqScriptCreate(‘ScoreData-1’,
‘function(dat, datastore_name) {
ore.load(datastore_name)
pred <- predict(mod, newdata=dat)
pred[as.integer(rownames(pred))] <- pred
cbind(dat, PRED = pred)
}’);
end;
/
select *
from table(rqTableEval(
cursor(select ARRDELAY, DISTANCE, DEPDELAY from ONTIME_S
where YEAR = 2003 and MONTH = 5 and DAYOFMONTH = 2),
cursor(select 1 as “ore.connect”,
‘myDatastore’ as “datastore_name” from dual),
‘select ARRDELAY, DISTANCE, DEPDELAY, 1 PRED from ontime_s’,
‘ScoreData-1’))
order by 1, 2, 3;
Datastore functionality supports the following interface: ore.save, ore.load, ore.datastore,
ore.datastoreSummary, and ore.delete. See the online documentation for details. A presentation on the ORE transparency layer that provides additional examples is available here.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.