How to select and merge R data frames with SQL
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The R language provides many features in the language for selecting data from data frames: the “[” operator, logical functions, and utility functions like “subset“. But if you know SQL (the query language ubiquitous in database systems), none of this is necessary. With the sqldf package, you can just pretend that your data frame is a database, and use SQL directly.
The sqldf function supports the full richness of the SQL language, but applied to data frames in R's memory. This includes:
- SELECT … WHERE statements to select rows and columns according to logical criteria
- CASE clauses, for queries with special cases
- ORDER BY statements, to sort the resulting data according to specified columns
- LEFT JOIN and INNER JOIN statements for merging data frames
The sqldf package uses its own internal database engine, so there's no special database configuration you need to do. Just enter the following in R:
install.packages("sqldf") library(sqldf)
and you should be good to go. The SQLDF FAQ is a good resource for getting started, and this sqldf video tutorial from Keystone Solutions shows the sqldf package in action with examples of SQL queries from simple to complex.
Google code: sqldf: SQL select on R data frames
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.