Handling required and missing R packages in Microsoft R Services

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I have seen several time, that execute R code using procedure sp_execute_external_script was not valid due to missing library or library dependencies.
Problem is – in general – not solved out of the box. But can be solved using and maintaining a list of installed libraries used by Microsoft R services or by simply create a “pre-code” R code to do a check for you.
In both cases, user will end up with additional code, but it might be a good check if you are installing library in production and you run such check prior to running any relevant R code.
Let’s start with simple R code:
USE WideWorldImporters; GO EXECUTE sp_execute_external_script @language = N'R' ,@script=N'library(Hmisc) df <- data.frame(rcorr(as.matrix(sp_RStats_query), type="pearson")$P) OutputDataSet<-df' ,@input_data_1 = N'SELECT SupplierID ,UnitPackageID ,OuterPackageID FROM [Warehouse].[StockItems]' ,@input_data_1_name = N'sp_RStats_query' WITH RESULT SETS ((SupplierID NVARCHAR(200) ,UnitPackageID NVARCHAR(200) ,OuterPackageID NVARCHAR(200)));
This code will in my case return error message, that R Service is missing a specific library in order to execute R code – in my case library Hmisc.
So the first step to solve this issue is to declare variable for R script and parametrize @script parameter for procedure sp_execute_external_script.
DECLARE @OutScript NVARCHAR(MAX) SET @OutScript =N'library(Hmisc) df <- data.frame(rcorr(as.matrix(sp_RStats_query), type="pearson")$P) OutputDataSet<-df' EXECUTE sp_execute_external_script @language = N'R' ,@script= @OutScript ,@input_data_1 = N'SELECT SupplierID ,UnitPackageID ,OuterPackageID FROM [Warehouse].[StockItems]' ,@input_data_1_name = N'sp_RStats_query' WITH RESULT SETS (( SupplierID NVARCHAR(200) ,UnitPackageID NVARCHAR(200) ,OuterPackageID NVARCHAR(200) ));
Now we need to do a string search for following patterns:
- library (c(package1,package2))
- library (package1), library(package2)
- library(package1)
All these reserved R words denote the need for particular library/libraries installation.
DECLARE @OutScript NVARCHAR(MAX) SET @OutScript =N'library(Hmisc) library(test123) df <- data.frame(rcorr(as.matrix(sp_RStats_query), type="pearson")$P) OutputDataSet<-df' /* *************************************************************** START: CODE FOR CHECKING AND INSTALLING MISSING R LIBRARIES *************************************************************** */ DECLARE @Tally TABLE (num TINYINT,R_Code NVARCHAR(MAX)) INSERT INTO @Tally VALUES (1,@OutScript) DECLARE @libstatement NVARCHAR(MAX) DECLARE @cmdstatement NVARCHAR(MAX) ;WITH CTE_R(num,R_Code, libname) AS ( SELECT 1 AS num, RIGHT(R_Code, LEN(R_Code) - CHARINDEX(')', R_Code, 0)) AS R_Code, substring(R_Code, CHARINDEX('library(', R_Code, 0) + 0, CHARINDEX(')', R_Code, 0) - CHARINDEX('library(', R_Code, 0) + 1) AS libname FROM @Tally WHERE CHARINDEX('(', R_Code, 0) > 0 AND CHARINDEX('library(',R_Code,0) > 0 UNION ALL SELECT 1 AS num, RIGHT(R_Code, LEN(R_Code) - CHARINDEX(')', R_Code, 0)) AS R_Code, substring(R_Code, CHARINDEX('library(', R_Code, 0) + 0, CHARINDEX(')', R_Code, 0) - CHARINDEX('library(', R_Code, 0) + 1) AS libname FROM CTE_R WHERE CHARINDEX('(', R_Code, 0) > 0 AND CHARINDEX('library(',R_Code,0) > 0 ) , fin AS ( SELECT TOP 1 stuff((SELECT ' install.packages(''''' + REPLACE(REPLACE(REPLACE(c1.libname,'library',''),')',''),'(','') + ''''' , dependencies = T)' FROM CTE_R AS c1 WHERE c1.num = c2.num FOR XML PATH (''),TYPE).value('.','NVARCHAR(MAX)'),1,1,'') AS lib_stat FROM CTE_R AS c2 ) SELECT @libstatement = lib_stat FROM fin SET @cmdstatement = 'EXEC xp_cmdshell ''"C:\Program Files\Microsoft SQL Server\ MSSQL13.MSSQLSERVER\R_SERVICES\bin\R.EXE" cmd -e ' + @libstatement + '''' EXEC SP_EXECUTESQL @cmdstatement /* *************************************************************** END: CODE FOR CHECKING AND INSTALLING MISSING R LIBRARIES *************************************************************** */ EXECUTE sp_execute_external_script @language = N'R' ,@script= @OutScript ,@input_data_1 = N'SELECT SupplierID ,UnitPackageID ,OuterPackageID FROM [Warehouse].[StockItems]' ,@input_data_1_name = N'sp_RStats_query' WITH RESULT SETS (( SupplierID NVARCHAR(200) ,UnitPackageID NVARCHAR(200) ,OuterPackageID NVARCHAR(200) ));
Result in this case will be successful with correct R results and sp_execute_external_script will not return error for missing libraries.
I added a “fake” library called test123 for testing purposes if all the libraries will be installed successfully.
At the end the script generated xp_cmdshell command (in one line):
EXEC xp_cmdshell '"C:\Program Files\Microsoft SQL Server\ MSSQL13.MSSQLSERVER\R_SERVICES\bin\R.EXE" cmd -e install.packages(''Hmisc'') install.packages(''test123'')'
You might also experience the following error in the output of xp_cmdshell command:
In this case, go to the following location and enable write permission for this folder and subfolders.
For the end, you will always find more elegant and easy way to write R code that will inadvertently check the installation (as well as version and dependencies for library) with following R code:
if(!is.element("Hmisc", installed.packages())) {install.packages("Hmisc", dependencies = T) }else{library("Hmisc")}
So the original code can simply be changed to:
USE WideWorldImporters; GO EXECUTE sp_execute_external_script @language = N'R' ,@script=N'if(!is.element("Hmisc", installed.packages())) {install.packages("Hmisc", dependencies = T) }else{library("Hmisc")} df <- data.frame(rcorr(as.matrix(sp_RStats_query), type="pearson")$P) OutputDataSet<-df' ,@input_data_1 = N'SELECT SupplierID ,UnitPackageID ,OuterPackageID FROM [Warehouse].[StockItems]' ,@input_data_1_name = N'sp_RStats_query' WITH RESULT SETS ((SupplierID NVARCHAR(200) ,UnitPackageID NVARCHAR(200) ,OuterPackageID NVARCHAR(200)));
But unfortunately, one can not always count on the consistence of developer or data scientists or author of the code, that they will always add a simple check for library installation.
Code is available at GitHub.
Happy R-TSQLing!

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.