Using R sp_execute_external_script with JSON
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
JSON has become part of the SQL Server in the same version as R. Both were very highly anticipated and awaited from the community.
JSON has very powerful statements for converting to and from JSON for storing into / from SQL Server engine (FOR JSON and JSON VALUE, etc). And since it is gaining popularity for data exchange, I was curious to give it a try with R combination.
I will simply convert a system table into array using for json clause.
SELECT top 10 object_id FROM sys.objects FOR JSON AUTO;
and it gives back the result:
[{"object_id":3},{"object_id":5},{"object_id":6},{"object_id":7},{"object_id":8}, {"object_id":9},{"object_id":17},{"object_id":18},{"object_id":19},{"object_id":20}]
And sp_execute_external_script query without JSON would look like:
EXECUTE sp_execute_external_script @language = N'R' ,@script=N'OutputDataSet <- InputDataSet' ,@input_data_1 = N'SELECT top 10 object_id FROM sys.objects' WITH RESULT SETS ((nr INT));
Now, let’s suppose we want to use JSON result directly into T-SQL using sp_execute_external_script. Yes, imagine getting results from an API and you want to push the results immediately into R for analysis. Very straight-forward package in R is called jsonlite (also available is rjson). Query would be as following:
EXECUTE sp_execute_external_script @language = N'R' ,@script=N'library(jsonlite) OutputDataSet <- data.frame(fromJSON(InputDataSet))' ,@input_data_1 = N'SELECT top 10 object_id FROM sys.objects FOR JSON AUTO' WITH RESULT SETS ((nr INT));
Nope!
Msg 39004, Level 16, State 20, Line 15 A 'R' script error occurred during execution of 'sp_execute_external_script' with HRESULT 0x80004004. Msg 39019, Level 16, State 1, Line 15 An external script error occurred: Error: Argument 'txt' must be a JSON string, URL or file.
So the argument ‘txt’ must be a JSON string, URL or file. Khm…very “useful” error message, but problem is, that data from T-SQL is stored and presented as data.frame to R environment (Launchpad), because the data type passed to R is array of objects. And would look something like:
Running this query in native (R) environment, we at least get the idea where and how to tackle the problem. So we need to convert the data.frame to a charaters using toJSON and as.character, so that the end T-SQL query would look like:
EXECUTE sp_execute_external_script @language = N'R' ,@script=N' library(jsonlite) js <- InputDataSet js2 <- as.character(toJSON(js)) OutputDataSet <- data.frame(fromJSON(js2))' ,@input_data_1 = N'SELECT top 10 object_id FROM sys.objects FOR JSON AUTO' WITH RESULT SETS ((nr INT));
Now we get the correct results (as if we would not used JSON):
So R is ready for JSON and JSON is also ready for R.
Happy R+JSON+SQLing!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.