Site icon R-bloggers

Separate SQL

[This article was first published on HighlandR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I did a tweet earlier which was slightly less nonsense than usual.
Shocker, I know.

I don’t think the idea of having SQL files being kept separate, rather than having a SQL query either embedded or called within an R script, is that controversial. But some folk clearly have not thought to try it before.

So, in R terms, rather than having a connection defined, and then pasting a SQL query into a DBI call, I’d be doing something like this to make the initial connection:

source(here("01-con.R"))

Then, once that ran successfully, I’d run the query with this:

results <- dbGetQuery(con, statement = readr::read_file(here('02-get-results.sql')))

(My query names are more descriptive in real life).

Let’s backtrack – my motivation for doing this was quite simple.

We needed to make a change to query due to a change in reporting requirements. This meant updating a couple of reports, and I got fed up scrolling aound trying to find the queries inside my rmarkdown scripts.

‘What if they were actually outside the script, and called in as needed?’

Perceived benefits of this approach:

If you don’t know SQL, then Power Query almost guides you to load an entire table, then drop columns/ rows you don’t need. Unless, you know to write a query up front and use that in the connection panel. Even though they cannot write the SQL themselves, they can grab the working query, paste it in, safe in the knowledge it works, and limit the needless transfer of data. If that query is available in a standalone file, there is more chance of them using it, and less likelihood of things going wrong.

There are likely to be more benefits, but these are all seem worthwhile.

If, however, you’ve already tried this and it didn’t work for you, I’d be interested to hear what went wrong, and what your final approach was.

N.B.- This is a very early draft post, in line with my ‘publish it now, or it might never get done’ strategy, and so I invite you to check back on my site to see if it has been updated.

But to reiterate, I definitely want to hear about any perceived advantages/ disadvantages of this approach.

To leave a comment for the author, please follow the link and comment on their blog: HighlandR.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.