Site icon R-bloggers

Dynamic schema, table and column names in SQL Server queries as a gateway to functional programming with R

[This article was first published on HighlandR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’m looking into creating some functions to make it easier to carry out quality checks on our database tables.
I’m using SQL Server, where tables are referred to in a [database].[schema].[tablename] format, although you can forgo some of this by using the USE statement –

USE my_data_base
GO

Let’s say you want to run a simple query:

SELECT columns,
you,
want,
to,
select
FROM yourschema.yourtablename;

Easy enough, but you might want to run it over several tables, and vary the columns. You could write dynamic SQL, which can be messy, and usually frowned upon.
Or you could use R, and write code that is less messy, and feels a bit magical.

I’ve seen several examples of setting dynamic WHERE clauses, but I want dynamic schema names, table names, and column names too. How do we do it?

First, you specify your SQL Server connection:

con <- DBI::dbConnect(odbc::odbc(), 
                      Driver = "SQL Server", 
                      Server = "SERVER", 
                      Database = "DB_NAME", 
                      Trusted_Connection = "True", 
                      Port = 1433)

Now let’s set up some variables, for the columns we need to SELECT, and some filtering variables for future use

vars <- c("columns", "you", "want", "to", "select")
date_var <- 'date_col'

start_date <- as.Date('2022-01-01')
today <- Sys.Date()

Now the crux of the problem is the SQL Server naming convention. If you are querying a schema other than dbo, then you may have problems with your queries parsing correctly.

So what can you do?

Option 1 – You can define the schema and table names separately and refer to them using glue_sql from the glue package

tablename <- "yourtablename"
schema_name <- "yourschema"

Your query now looks like this

query <- glue_sql(.con = con, "SELECT TOP(10) {`vars`*} FROM {`schema_name`}.{`tablename`} ")
DBI::dbGetQuery(con, query)

Note the use of {`vars`*}, which collapses the vars vector, separated by commas, so that it resembles the SELECT statement we would normally write in SQL.

Option 2 – Alternately, we can use DBI::SQL

tbl2 <- DBI::SQL("yourschema.yourtablename")
test <- glue::glue_sql(.con = con, "SELECT TOP(10) {`vars`*} FROM {`tbl2`}")
DBI::dbGetQuery(con, test)

The difference, with option 1, we refer to the variable (e.g. schema_name) whereas in option 2 we refer to the value of the variable (e.g. yourschema)

Both work, so take your pick.

With these basic queries working, we can continue to experiment, with the aim of using dplyr and purrr to query, wrangle and visualise the data all from the same tool.

To leave a comment for the author, please follow the link and comment on their blog: HighlandR.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.