Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’m looking into creating some functions to make it easier to carry out quality checks on our database tables.
I’m using SQL Server, where tables are referred to in a [database].[schema].[tablename] format, although you can forgo some of this by using the USE
statement –
USE my_data_base GO
Let’s say you want to run a simple query:
SELECT columns, you, want, to, select FROM yourschema.yourtablename;
Easy enough, but you might want to run it over several tables, and vary the columns. You could write dynamic SQL, which can be messy, and usually frowned upon.
Or you could use R, and write code that is less messy, and feels a bit magical.
I’ve seen several examples of setting dynamic WHERE
clauses, but I want dynamic schema names, table names, and column names too. How do we do it?
First, you specify your SQL Server connection:
con <- DBI::dbConnect(odbc::odbc(), Driver = "SQL Server", Server = "SERVER", Database = "DB_NAME", Trusted_Connection = "True", Port = 1433)
Now let’s set up some variables, for the columns we need to SELECT
, and some filtering variables for future use
vars <- c("columns", "you", "want", "to", "select") date_var <- 'date_col' start_date <- as.Date('2022-01-01') today <- Sys.Date()
Now the crux of the problem is the SQL Server naming convention. If you are querying a schema other than dbo
, then you may have problems with your queries parsing correctly.
So what can you do?
Option 1 – You can define the schema and table names separately and refer to them using glue_sql
from the glue
package
tablename <- "yourtablename" schema_name <- "yourschema"
Your query now looks like this
query <- glue_sql(.con = con, "SELECT TOP(10) {`vars`*} FROM {`schema_name`}.{`tablename`} ") DBI::dbGetQuery(con, query)
Note the use of {`vars`*}
, which collapses the vars
vector, separated by commas, so that it resembles the SELECT
statement we would normally write in SQL.
Option 2 – Alternately, we can use DBI::SQL
tbl2 <- DBI::SQL("yourschema.yourtablename") test <- glue::glue_sql(.con = con, "SELECT TOP(10) {`vars`*} FROM {`tbl2`}") DBI::dbGetQuery(con, test)
The difference, with option 1, we refer to the variable (e.g. schema_name
) whereas in option 2 we refer to the value of the variable (e.g. yourschema
)
Both work, so take your pick.
With these basic queries working, we can continue to experiment, with the aim of using dplyr and purrr to query, wrangle and visualise the data all from the same tool.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.