Versioning R model objects in SQL Server
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
High-level info
If you build a model and never update it you’re missing a trick. Behaviours change so your model will tend to perform worse over time. You’ve got to regularly refresh it, whether that’s adjusting the existing model to fit the latest data (recalibration) or building a whole new model (retraining), but this means you’ve got new versions of your model that you have to handle. You need to think about your methodology for versioning R model objects, ideally before you lose any versions.
You could store models with ye olde YYYYMMDD style of versioning but that means regularly changing your code to use the latest model version. I’m too lazy for that!
If we’re storing our R model objects in SQL Server then we can utilise another SQL Server capability, temporal tables, to take the pain out of versioning and make it super simple.
Temporal tables will track changes automatically so you would overwrite the previous model with the new one and it would keep a copy of the old one automagically in a history table. You get to always use the latest version via the main table but you can then write temporal queries to extract any version of the model that’s ever been implemented. Super neat!
For some of you, if you’re not interested in the technical details you can drop off now with the knowledge that you can store your models in a non-destructive but easy to use way in SQL Server if you need to.
If you want to see how it’s done, read on!
The technical info
Below is a working example of how to do versioning R model objects in SQL Server:
- define a versioned model table
- write a model into the model table
- create a new model and overwrite the old
- create a prediction using the latest model on the fly
Note this doesn’t tell you what changed, just that something did change. To identify model changes you will need to load up the models and compare their coefficients or other attributes to identify what exactly changed.
SQL objects
A temporal table
A normal table for storing a model might look like
CREATE TABLE [companyModels] ( [id] int NOT NULL PRIMARY KEY CLUSTERED , [name] varchar(200) NOT NULL , [modelObj] varbinary(max) , CONSTRAINT unique_modelname UNIQUE ([name]))
If we’re turning it into a temporal table we need to add some extra columns, but we won’t worry about these extra columns day-to-day.
CREATE TABLE [companyModels] ( [id] int NOT NULL PRIMARY KEY CLUSTERED , [name] varchar(200) NOT NULL , [modelObj] varbinary(max) , [ValidFrom] datetime2 (2) GENERATED ALWAYS AS ROW START , [ValidTo] datetime2 (2) GENERATED ALWAYS AS ROW END , PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo) , CONSTRAINT unique_modelname UNIQUE ([name])) WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.companyModelsHistory));
A stored procedure
As we have the ID and Valid* columns in our table, we can’t use RODBCs simple table functions and the ID column doesn’t play nicely with RevoScaleR’s rxWriteObject()
function as that wants to insert a NULL. It’s not all bad though because we can get around it by using a stored procedure.
This stored procedure will perform an INSERT if it can’t find a model by the given name, and will perform an UPDATE if it does find a match.
CREATE PROCEDURE modelUpsert @modelname varchar(200) , @modelobj varbinary(max) AS WITH MySource as ( select @modelname as modelName, @modelobj as modelObj ) MERGE companymodels AS MyTarget USING MySource ON MySource.modelname = MyTarget.modelname WHEN MATCHED THEN UPDATE SET modelObj = MySource.modelObj WHEN NOT MATCHED THEN INSERT ( modelname, modelObj ) VALUES ( MySource.modelname, MySource.modelObj );
R
Build a model
We need a model to save!
To be able to store the model in the database we need to use the serialize()
function to turn it into some raw character thingies and then combine them together with paste0()
so they go in the same row.
myModelV1<-lm(column1~column2, someData) preppedDF<-data.frame(modelname="myModel", modelobj=paste0(serialize(myModelV1,NULL) ,collapse = "") )
Call the stored procedure
We need RODBC and RODBCext for executing our stored procedure in our database.
library(RODBC) library(RODBCext) dbstring<-'Driver={ODBC Driver 13 for SQL Server};Server=XXX;Database=XXX;Uid=XXX;Pwd=XXX' dbconn<-RODBC::odbcDriverConnect(dbstring) RODBCext::sqlExecute(dbconn, "exec modelUpsert @modelname=? , @modelobj=?", data = preppedDF)
This will now have our model in our database table.
RODBC::sqlQuery(dbconn, "select * from companymodels") # 1 row RODBC::sqlQuery(dbconn, "select * from companymodelshistory") # 0 row
You should get one row in our main table and no rows in ouor history table.
Rinse and repeat
If we make a change to our model and then push the new model with the same name, we’ll still get one row in our core table, but now we’ll get a row in our history table that contains our v1 model.
myModelV2<-lm(column1~column2, someData[-1,]) preppedDF<-data.frame(modelname="myModel", modelobj=paste0(serialize(myModelV2,NULL) ,collapse = "") ) RODBCext::sqlExecute(dbconn, "exec modelUpsert @modelname=? , @modelobj=?", data = preppedDF) RODBC::sqlQuery(dbconn, "select * from companymodels") # 1 row RODBC::sqlQuery(dbconn, "select * from companymodelshistory") # 1 row
Using our model in SQL
If we want to use our model for predictions in SQL, we can now retrieve it from the table along with some input data and get back our input data plus a prediction.
DECLARE @mymodel VARBINARY(MAX)=(SELECT modelobj FROM companymodels WHERE modelname='mymodel' ); EXEC sp_execute_external_script @language = N'R', @script = N' OutputDataSet<-cbind(InputDataSet, predict(unserialize(as.raw(model)), InputDataSet) ) ', @input_data_1 = N'SELECT 42 as column2', @params = N'@model varbinary(max)', @model = @mymodel
The post Versioning R model objects in SQL Server appeared first on Locke Data. Locke Data are a data science consultancy aimed at helping organisations get ready and get started with data science.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.