A preview of using Revolution R Enterprise inside SQL Server
[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
by Andrie de Vries
The second week of SQLRelay (#SQLRelay) kicked off in London earlier this week. SQLRelay is a series of conferences, spanning 10 cities in the United Kingdom over two weeks. The London agenda included 4 different streams, with tracks for the DBA, BI and Analytics users, as well as a workshop track with two separate tutorials.
My speaking slot was in the afternoon, with the title “In-database analytics using Revolution R and SQL“.
In my talk I covered:
- A high level overview of R
- Data science in the cloud
- Connecting R to SQL
- Scalable R
- R in SQL Server
- Moving your workflow to the cloud
Although the functionality of using R directly inside SQL Server will only be part of SQL Server 2016, Microsoft announced earlier this year that SQL Server 2016 will include Revolution Analytics. I expect that more information will be released during the PASS 2015 summit in Seattle at the end of this month.
In my talk I included 5 simple demonstrations. The first 3 demonstrations appeal to the data scientist coding in R:
- Connecting R to SQL Server using an RODBC connector
- Using Revolution R Enterprise (RRE) in a local parallel compute context, reading data from a local file
- Changing the compute context to SQL Server, and running the R code directly inside the SQL Server machine
The last two demonstrations demonstrate how to run some R code embedded in a SQL stored procedure:
- Creating a very simple script that calls out to R
- Using R to generate some data, in this case simply bringing some data in from the famous iris data set that is built into R.
Presentation
The presentation is available on SlideShare:
Sample code
Here are the code samples I used in the demonstration:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### Connect to SQL Server using RODBC | |
library(RODBC) | |
library(magrittr) | |
# Connect to SQL Server using RODBC ------------------ | |
sqlHost <- "DAA136209339.sys-sqlsvr.local" | |
sqlDatabase <- "RevoTestDB" | |
dsnString <- "driver={SQL Server};server=%s;database=%s;trusted_connection=true" | |
dsn <- sprintf(dsnString, sqlHost, sqlDatabase) | |
dbHandle <- odbcDriverConnect(dsn) | |
# Send query and return data ------------------------- | |
query <- "SELECT * FROM AirlineDemoSmall" | |
airDataFrame <- sqlQuery(dbHandle, query) | |
airDataFrame %>% head | |
str(airDataFrame) | |
# Create linear model using lm() --------------------- | |
day.names <- paste0(c("Mon", "Tues", "Wednes", "Thurs", "Fri", "Satur", "Sun"), | |
"day") | |
levels(airDataFrame$DayOfWeek) <- day.names | |
model.lm <- lm(ArrDelay ~ CRSDepTime + DayOfWeek - 1, airDataFrame) | |
summary(model.lm) | |
library(coefplot) | |
coefplot(model.lm, title = "Linear model using lm()") | |
# Close connection ----------------------------------- | |
# odbcClose(dbHandle) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### Use RevoScaleR RxLocalParallel compute context | |
day.names <- paste0(c("Mon", "Tues", "Wednes", "Thurs", "Fri", "Satur", "Sun"), | |
"day") | |
# Define input object as local csv file -------------- | |
air.csv <- RxTextData( | |
"airdata.csv", | |
rowsPerRead = 100e3, | |
delimiter = ",", | |
colInfo = list(ArrDelay = list(type = "integer"), | |
DayOfWeek = list(type = "factor", levels = day.names) | |
) | |
) | |
# Convert to xdf ------------------------------------- | |
xdf <- "airdata.xdf" | |
rxDataStep(air.csv, outFile = xdf, overwrite = TRUE) | |
# Set local parallel compute context------------------ | |
rxSetComputeContext( | |
RxLocalParallel() | |
) | |
rxGetInfo(xdf) | |
rxGetVarInfo(xdf) | |
# Get data summary and create linear model ----------- | |
rxSummary(~ArrDelay + DayOfWeek, data = xdf, reportProgress = 0) | |
model.rxLinMod <- rxLinMod(ArrDelay ~ CRSDepTime + DayOfWeek - 1, xdf, reportProgress = 2) | |
summary(model.rxLinMod) | |
library(coefplot) | |
coefplot(model.rxLinMod, title = "Linear model using rxLinMod()") | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### Use RevoScaleR RxInSqlServer compute context | |
sqlHost <- "DAA136209339.sys-sqlsvr.local" | |
sqlDatabase <- "RevoTestDB" | |
sqlUser <- "xxxxxxx" | |
sqlPw <- "xxxxxxxx" | |
day.names <- paste0(c("Mon", "Tues", "Wednes", "Thurs", "Fri", "Satur", "Sun"), | |
"day") | |
# Define SQL Server connection string ---------------- | |
con <- sprintf("Driver=SQL Server;Server=%s;Database=%s;Uid=%s;Pwd=%s", | |
sqlHost, sqlDatabase, sqlUser, sqlPw) | |
# Define data object, including database table ------- | |
airData <- RxSqlServerData( | |
connectionString = con, | |
table = "AirlineDemoSmall", | |
colInfo = list(ArrDelay = list(type = "integer"), | |
DayOfWeek = list(type = "factor", levels = day.names) | |
) | |
) | |
# Define compute context ----------------------------- | |
rxSetComputeContext( | |
RxInSqlServer(connectionString = con, | |
autoCleanup = FALSE, | |
consoleOutput = TRUE | |
) | |
) | |
# Perform analysis inside database ------------------- | |
model.rxLinMod <- rxLinMod(ArrDelay ~ CRSDepTime + DayOfWeek - 1, airData) | |
summary(model.rxLinMod) | |
library(coefplot) | |
coefplot(model.rxLinMod, title = "Linear model using rxLinMod()") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
EXECUTE sp_execute_external_script | |
@language = N'R' | |
, @script = N'OutputDataSet <- InputDataSet' | |
, @input_data_1 = N'SELECT 1 as Col' | |
WITH RESULT SETS ((col int not null)); | |
go | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
EXECUTE sp_execute_external_script | |
@language = N'R' | |
, @script = N'OutputDataSet <- subset(iris, select=-Species);' | |
--, @parallel = 0 | |
, @input_data_1 = N'SELECT 1 as Col' | |
WITH RESULT SETS (("Sepal.Length" float not null, "Sepal.Width" float not null | |
, "Petal.Length" float not null, "Petal.Width" float not null)); | |
go |
To leave a comment for the author, please follow the link and comment on their blog: Revolutions.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.