Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
PostgreSQL and R can often be used together for data analysis – PostgreSQL as database engine and R as statistical tool. In this article you will learn how to access data stored in PostgreSQL database and how to write the data back using RPostgreSQL and sqldf packages.
Connect to PostgreSQL Database
RPostgreSQL
To open connection to PostgreSQL database, you use dbConnect() command. The simplest connection to localhost takes just two lines. Of course, you can connect to any PostgreSQL database by specifying all connections options – database name, host, port, user and password.
# Establish connection to PoststgreSQL using RPostgreSQL drv <- dbDriver("PostgreSQL") con <- dbConnect(drv)# Simple version (localhost as default) # Full version of connection seetting # con <- dbConnect(drv, dbname="dbname",host="host",port=1234,user="user",password="password", )
At the end of you script, do not forget to close the connection:
# Close PostgreSQL connection dbDisconnect(con)
sqldf
Connection to PostgreSQL database with sqldf package has to be set in a different way, by setting RPostgreSQL options:
options(sqldf.RPostgreSQL.user ="postgres", sqldf.RPostgreSQL.password ="password", sqldf.RPostgreSQL.dbname ="test", sqldf.RPostgreSQL.host ="localhost", sqldf.RPostgreSQL.port =5432)
Working with the database
I will use the following table for all examples in this article:
create table tmp.test_tbl (a int, b int); insert into tmp.test_tbl values (1,4); insert into tmp.test_tbl values (2,5); insert into tmp.test_tbl values (3,6);
RPostgreSQL
RPostgreSQL has the following main features. Test table existence (note the specific notation for schema identifaction):
dbExistsTable(con, c("tmp","test_tbl"))
Read table from PostgreSQL into R data frame:
myTable <- dbReadTable(con, c("tmp","test_tbl")) # Equals to myTable <- sqldf("select * from tmp.test_tbl")
Write results back to PostgreSQL:
# overwrite=TRUE will change both data and table structure # When row.name=TRUE then column named row.names will be added to the table dbWriteTable(con, c("tmp","test_tbl_out"), value=myTable,overwrite=TRUE,row.names=FALSE)
Append data to table:
dbWriteTable(con, c("tmp","test_tbl_out"), value=myTable,append=TRUE, row.names=FALSE)
sqldf
With sqldf you can easily execute any SQL command. For example, you can drop table, create table, insert new records:
library("sqldf") # We can use sqldf() to issue any command, including drop, create, select, insert sqldf(" /* sql comments can be used*/ drop table if exists tmp.test_tbl; create table tmp.test_tbl (a int, b int); insert into tmp.test_tbl values (1,4); insert into tmp.test_tbl values (2,5); insert into tmp.test_tbl values (3,6); ")
Sometimes you might need to switch back to SQLite driver. This way, we can manipulate R data frames using pure SQL.
myTable <- sqldf("select a, b, a+b as c from myTable", drv="SQLite")
Other resources
Both RPostgreSQL and sqldf have their project homepage:
All the examples are accessible in this gist
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.