A preview of using Revolution R Enterprise inside SQL Server

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by Andrie de Vries

The second week of SQLRelay (#SQLRelay) kicked off in London earlier this week. SQLRelay is a series of conferences, spanning 10 cities in the United Kingdom over two weeks. The London agenda included 4 different streams, with tracks for the DBA, BI and Analytics users, as well as a workshop track with two separate tutorials.

My speaking slot was in the afternoon, with the title “In-database analytics using Revolution R and SQL“.

In my talk I covered:

  • A high level overview of R
  • Data science in the cloud
  • Connecting R to SQL
  • Scalable R
  • R in SQL Server
  • Moving your workflow to the cloud
Although the functionality of using R directly inside SQL Server will only be part of SQL Server 2016, Microsoft announced earlier this year that SQL Server 2016 will include Revolution Analytics. I expect that more information will be released during the PASS 2015 summit in Seattle at the end of this month.
 
In my talk I included 5 simple demonstrations.  The first 3 demonstrations appeal to the data scientist coding in R:
  • Connecting R to SQL Server using an RODBC connector
  • Using Revolution R Enterprise (RRE) in a local parallel compute context, reading data from a local file
  • Changing the compute context to SQL Server, and running the R code directly inside the SQL Server machine

The last two demonstrations demonstrate how to run some R code embedded in a SQL stored procedure:

  • Creating a very simple script that calls out to R
  • Using R to generate some data, in this case simply bringing some data in from the famous iris data set that is built into R.

 

Presentation

The presentation is available on SlideShare:

 

Sample code

Here are the code samples I used in the demonstration:

 

### Connect to SQL Server using RODBC
library(RODBC)
library(magrittr)
# Connect to SQL Server using RODBC ------------------
sqlHost <- "DAA136209339.sys-sqlsvr.local"
sqlDatabase <- "RevoTestDB"
dsnString <- "driver={SQL Server};server=%s;database=%s;trusted_connection=true"
dsn <- sprintf(dsnString, sqlHost, sqlDatabase)
dbHandle <- odbcDriverConnect(dsn)
# Send query and return data -------------------------
query <- "SELECT * FROM AirlineDemoSmall"
airDataFrame <- sqlQuery(dbHandle, query)
airDataFrame %>% head
str(airDataFrame)
# Create linear model using lm() ---------------------
day.names <- paste0(c("Mon", "Tues", "Wednes", "Thurs", "Fri", "Satur", "Sun"),
"day")
levels(airDataFrame$DayOfWeek) <- day.names
model.lm <- lm(ArrDelay ~ CRSDepTime + DayOfWeek - 1, airDataFrame)
summary(model.lm)
library(coefplot)
coefplot(model.lm, title = "Linear model using lm()")
# Close connection -----------------------------------
# odbcClose(dbHandle)
view raw 1-RODBC.R hosted with ❤ by GitHub
### Use RevoScaleR RxLocalParallel compute context
day.names <- paste0(c("Mon", "Tues", "Wednes", "Thurs", "Fri", "Satur", "Sun"),
"day")
# Define input object as local csv file --------------
air.csv <- RxTextData(
"airdata.csv",
rowsPerRead = 100e3,
delimiter = ",",
colInfo = list(ArrDelay = list(type = "integer"),
DayOfWeek = list(type = "factor", levels = day.names)
)
)
# Convert to xdf -------------------------------------
xdf <- "airdata.xdf"
rxDataStep(air.csv, outFile = xdf, overwrite = TRUE)
# Set local parallel compute context------------------
rxSetComputeContext(
RxLocalParallel()
)
rxGetInfo(xdf)
rxGetVarInfo(xdf)
# Get data summary and create linear model -----------
rxSummary(~ArrDelay + DayOfWeek, data = xdf, reportProgress = 0)
model.rxLinMod <- rxLinMod(ArrDelay ~ CRSDepTime + DayOfWeek - 1, xdf, reportProgress = 2)
summary(model.rxLinMod)
library(coefplot)
coefplot(model.rxLinMod, title = "Linear model using rxLinMod()")
### Use RevoScaleR RxInSqlServer compute context
sqlHost <- "DAA136209339.sys-sqlsvr.local"
sqlDatabase <- "RevoTestDB"
sqlUser <- "xxxxxxx"
sqlPw <- "xxxxxxxx"
day.names <- paste0(c("Mon", "Tues", "Wednes", "Thurs", "Fri", "Satur", "Sun"),
"day")
# Define SQL Server connection string ----------------
con <- sprintf("Driver=SQL Server;Server=%s;Database=%s;Uid=%s;Pwd=%s",
sqlHost, sqlDatabase, sqlUser, sqlPw)
# Define data object, including database table -------
airData <- RxSqlServerData(
connectionString = con,
table = "AirlineDemoSmall",
colInfo = list(ArrDelay = list(type = "integer"),
DayOfWeek = list(type = "factor", levels = day.names)
)
)
# Define compute context -----------------------------
rxSetComputeContext(
RxInSqlServer(connectionString = con,
autoCleanup = FALSE,
consoleOutput = TRUE
)
)
# Perform analysis inside database -------------------
model.rxLinMod <- rxLinMod(ArrDelay ~ CRSDepTime + DayOfWeek - 1, airData)
summary(model.rxLinMod)
library(coefplot)
coefplot(model.rxLinMod, title = "Linear model using rxLinMod()")
EXECUTE sp_execute_external_script
@language = N'R'
, @script = N'OutputDataSet <- InputDataSet'
, @input_data_1 = N'SELECT 1 as Col'
WITH RESULT SETS ((col int not null));
go
EXECUTE sp_execute_external_script
@language = N'R'
, @script = N'OutputDataSet <- subset(iris, select=-Species);'
--, @parallel = 0
, @input_data_1 = N'SELECT 1 as Col'
WITH RESULT SETS (("Sepal.Length" float not null, "Sepal.Width" float not null
, "Petal.Length" float not null, "Petal.Width" float not null));
go

 

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)