Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< size="2" face="arial,helvetica,sans-serif">This Oracle R Enterprise (ORE) tutorial, on embedded R execution, is the third in a series to help users get started using ORE. See these links for the first tutorial on the transparency layer and second tutorial on the statistics engine< >< size="2" face="arial,helvetica,sans-serif">. Oracle R Enterprise is a component in the Oracle Advanced Analytics Option of Oracle Database Enterprise Edition. < >
< size="2" face="arial,helvetica,sans-serif">Embedded R Execution refers to the ability to execute an R script at the database server, which provides several benefits: spawning multiple R engines in parallel for data-parallel operations, more efficient data transfer between Oracle Database and the R engine, leverage a likely more powerful server with more CPUs and greater RAM, schedule automated jobs, and take advantage of open source R packages at the database server. < >< size="2" face="arial,helvetica,sans-serif">Data aggregates are computed in parallel, significantly reducing computation time, < >< size="2" face="arial,helvetica,sans-serif">without requiring sophisticated configuration steps.
< >
< size="2" face="arial,helvetica,sans-serif">ORE provides two interfaces for embedded R execution: one for R and one for SQL. The R interface enables interactive execution at the database server from the client R engine, e.g., your laptop. It also has transparency aspects for passing R objects and returning R objects.< > < size="2" face="arial,helvetica,sans-serif">In the R interface, the < face="courier new,courier,monospace">ore.doEval< > schedules execution of the R code with the database-embedded R engine and returns the results back to the desktop for continued analysis. User-defined R functions can run in parallel, either on each row, sets of rows, or on each group of rows given a grouping column. The first two cases are covered by < face="courier new,courier,monospace">ore.rowApply< >, the second by the < face="courier new,courier,monospace">ore.groupApply< > function. < face="courier new,courier,monospace">ore.indexApply< > provides parallel simulations capability by invoking the script the number of times specified by the user. The R interface returns results to the client as R objects that can be passed as arguments to R functions. < >
< size="2" face="arial,helvetica,sans-serif">The SQL interface enables interactive execution from any SQL interface, like SQL*Plus or SQL Developer, but it also enables R scripts to be included in production database-based systems. To enable execution of an R script in the SQL interface, ORE provides variants of < face="courier new,courier,monospace">ore.doEval< >, < face="courier new,courier,monospace">ore.groupApply< > and < face="courier new,courier,monospace">ore.indexApply< > in SQL. These functions are < face="courier new,courier,monospace">rqEva< >l, < face="courier new,courier,monospace">rqTableEval< >, < face="courier new,courier,monospace">rqRowEval < >and < face="courier new,courier,monospace">rqGroupEval.
< face="arial,helvetica,sans-serif">The SQL interface allows for storing results directly in the database.< >< >
< >
< size="2" face="arial,helvetica,sans-serif"> R Interface Function (ore.*)
< > |
< size="2" face="arial,helvetica,sans-serif"> SQL Interface Function (rq*)
< > |
< size="2" face="arial,helvetica,sans-serif"> Purpose< > |
< size="2" face="arial,helvetica,sans-serif"> ore.doEval< > | < size="2" face="arial,helvetica,sans-serif"> rqEval< > | < size="2" face="arial,helvetica,sans-serif"> Invoke stand-alone R script
< > |
< size="2" face="arial,helvetica,sans-serif"> ore.tableApply< > | < size="2" face="arial,helvetica,sans-serif"> rqTableEval< > | < size="2" face="arial,helvetica,sans-serif"> Invoke R script with full table input
< > |
< size="2" face="arial,helvetica,sans-serif"> ore.rowApply< > | < size="2" face="arial,helvetica,sans-serif"> rqRowEval< > | < size="2" face="arial,helvetica,sans-serif"> Invoke R script one row at a time, or multiple rows in "chunks"
< > |
< size="2" face="arial,helvetica,sans-serif"> ore.groupApply< > | < size="2" face="arial,helvetica,sans-serif"> rqGroupEval< > | < size="2" face="arial,helvetica,sans-serif"> Invoke R script on data indexed by grouping column
< > |
< size="2" face="arial,helvetica,sans-serif"> ore.indexApply
< > |
< size="2" face="arial,helvetica,sans-serif">N/A
< > |
< size="2" face="arial,helvetica,sans-serif"> Invoke R script N times
< > |
< >
< size="2" face="arial,helvetica,sans-serif">In addition, the SQL interface enables R results to be stored in a database table for subsequent use in another invocation (think data mining model building and scoring). It enables returning structured R results in a table. Results can also be returned as XML. The XML interface enables both structured data, such as data frames, R objects, and graphs to be returned. The XML capability allows R graphs and structured results to be displayed in Oracle BI Publisher documents and OBIEE dashboards.< >
< size="2" face="arial,helvetica,sans-serif">Embedded R Execution: R Interface < >
< size="2" face="arial,helvetica,sans-serif">The following example uses the function < face="courier new,courier,monospace">ore.groupApply< >, one of several embedded R execution functions, to illustrate how R users can achieve data parallelism < >< size="2" face="arial,helvetica,sans-serif">through the database. This example also illustrates that embedded R execution enables the use of open source packages. Here we see the use of the R package < >< size="2" face="arial,helvetica,sans-serif">< face="courier new,courier,monospace">biglm.< > < >
< size="2" face="arial,helvetica,sans-serif">We specify a column on which to partition the data. Each partition of the data is provided to the function through the first argument, in this case the < >< size="2" face="arial,helvetica,sans-serif">function variable dat. There is no need to send data from the database to R – the R function is sent to the database, which processes them in parallel. Output < >< size="2" face="arial,helvetica,sans-serif">results may be stored directly in the database, or may be downloaded to R. Only when we want to see the results of these models do we need to retrieve them < >< size="2" face="arial,helvetica,sans-serif">into R memory and perform, for example, the < face="courier new,courier,monospace">summary< > function.< >
< size="2" face="courier new,courier,monospace">modList <- ore.groupApply(
X=ONTIME,
INDEX=ONTIME$DEST,
function(dat) {
library(biglm)
biglm(ARRDELAY ~ DISTANCE + DEPDELAY, dat)
});< >
< size="2" face="courier new,courier,monospace">modList_local <- ore.pull(modList)< >
> summary(modList_local$BOS) Large data regression model: biglm(ARRDELAY ~ DISTANCE + DEPDELAY, dat) Sample size = 3928 Coef (95% CI) SE p (Intercept) 0.0638 -0.7418 0.8693 0.4028 0.8742 DISTANCE -0.0014 -0.0021 -0.0006 0.0004 0.0002 DEPDELAY 1.0552 1.0373 1.0731 0.0090 0.0000
< size="2" face="arial,helvetica,sans-serif">Embedded R Execution: SQL Interface < >
< size="2" face="arial,helvetica,sans-serif">Whereas the previous example showed how to use embedded R execution from the R environment, we can also invoke R scripts from SQL. This next example < >< size="2" face="arial,helvetica,sans-serif">illustrates returning a data frame from results computed in Oracle Database. We first create an R script in the database R script repository. The script is < >< size="2" face="arial,helvetica,sans-serif">defined as a function that creates a vector of 10 elements, and returns a data frame with those elements in one column and those elements divided by 100 in a < >< size="2" face="arial,helvetica,sans-serif">second column.< >
< size="2" face="arial,helvetica,sans-serif">Once the script is created, we can invoke it through SQL. One of the SQL embedded R executions table functions available is < face="courier new,courier,monospace">rqEval< >. The first argument is < face="courier new,courier,monospace">NULL< > < >< size="2" face="arial,helvetica,sans-serif">since we have no parameters to pass to the function. The second argument describes the structure of the result. Any valid SQL query that captures the name and < >< size="2" face="arial,helvetica,sans-serif">type of resulting columns will suffice. The third argument is the name of the script to execute. < >
< size="2" face="arial,helvetica,sans-serif">< face="courier new,courier,monospace">begin
sys.rqScriptCreate(‘Example1’,
‘function() {
ID <- 1:10
res <- data.frame(ID = ID, RES = ID / 100)
res}’);
end;
/
select *
from table(rqEval(NULL,
‘select 1 id, 1 res from dual’,
‘Example1’));< >< >
< size="2" face="arial,helvetica,sans-serif">The result is a data frame:< >
< size="2" face="arial,helvetica,sans-serif">
Embedded R scripts may generate any valid R object, including graphs. In addition, embedded R execution enables returning results from an R script as an XML < >< size="2" face="arial,helvetica,sans-serif">string. Consider the following example that creates a vector from the integers 1 to 10, plots 100 random normal points in a graph, and then returns the vector. < >< size="2" face="arial,helvetica,sans-serif">After creating the script in the database R script repository, we invoke the script using < face="courier new,courier,monospace">rqEval< >, but instead of specifying the form of the result in a SQL q< >< size="2" face="arial,helvetica,sans-serif">uery, we specify < face="courier new,courier,monospace">XML< >.< >
< size="2" face="courier new,courier,monospace">begin
sys.rqScriptCreate(‘Example6’,
‘function(){
res <- 1:10
plot( 1:100, rnorm(100), pch = 21,
bg = "red", cex = 2 )
res
}’);
end;
/
select value
from table(rqEval( NULL,’XML’,’Example6′));< >
< size="2" face="arial,helvetica,sans-serif">While the actual graph looks like the following, the output from this query will be an XML string. < >
< size="2" face="arial,helvetica,sans-serif">
In the execution results shown below, the VALUE column returned is a string that contains first the structured data in XML format. Notice the numbers 1 through < >< size="2" face="arial,helvetica,sans-serif">10 set off by the <value> tags. This is followed by the image in PNG base 64 representation. This type of output can be consumed by Oracle Business < >< size="2" face="arial,helvetica,sans-serif">Intelligence Publisher (BIP) to produce documents with R-generated graphs and structured content. Oracle BIP templates can also be used to expose < >< size="2" face="arial,helvetica,sans-serif">R-generated content in Oracle Business Intelligence Enterprise Edition (OBIEE) web browser-based dashboards. < >
< size="2" face="arial,helvetica,sans-serif">You can see additional examples using embedded R execution in action in the Oracle Enterprise Training, session 4, Embedded R Script Execution. These example will run as written in R 2.13.2 after installing Oracle R Enterprise.< >< size="2" face="arial,helvetica,sans-serif"> We’ll be posting more examples using embedded R script execution in the coming months. In the meantime, questions are always welcome on the Oracle R Forum.< >
< size="2" face="arial,helvetica,sans-serif">< >< size="2" face="arial,helvetica,sans-serif">< >
< size="2" face="arial,helvetica,sans-serif">< >< face="arial,helvetica,sans-serif">< size="2">< >
< size="2">< >< >
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.