CouchDB and R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Here are some quick crib notes on getting R talking to CouchDB using Couch’s ReSTful HTTP API. We’ll do it in two different ways. First, we’ll construct HTTP calls with RCurl, then move on to the R4CouchDB package for a higher level interface. I’ll assume you’ve already gotten started with CouchDB and are familiar with the basic ReST actions: GET PUT POST and DELETE.
First install RCurl and RJSONIO. You’ll have to download the tar.gz’s if you’re on a Mac. For the second part, we’ll need to install R4CouchDB, which depends on the previous two. I checked it out from GitHub and used R CMD INSTALL.
ReST with RCurl
Ping server
getURL("http://localhost:5984/") [1] "{\"couchdb\":\"Welcome\",\"version\":\"1.0.1\"}\n"
That’s nice, but we want to get the result back as a real R data structure. Try this:
welcome <- fromJSON(getURL("http://localhost:5984/")) welcome$version [1] "1.0.1"
Sweet!
PUT
One way to add a new record is with http PUT.
bozo = list(name="Bozo", occupation="clown", shoe.size=100) getURL("http://localhost:5984/testing123/bozo", customrequest="PUT", httpheader=c('Content-Type'='application/json'), postfields=toJSON(bozo)) [1] "{\"ok\":true,\"id\":\"bozo\",\"rev\":\"1-70f5f59bf227d2d715c214b82330c9e5\"}\n"
Notice that RJSONIO has no high level PUT method, so you have to fake it using the costumrequest parameter. I'd never have figured that out without an example from R4CouchDB's source. The API of libCurl is odd, I have to say, and RCurl mostly just reflects it right into R.
If you don't like the idea of sending a put request with a get function, you could use RCurl's curlPerform. Trouble is, curlPerform returns an integer status code rather than the response body. You're supposed to provide an R function to collect the response body text. Not really worth the bother, unless you're getting into some of the advanced tricks described in the paper, R as a Web Client - the RCurl package.
bim <- list( name="Bim", occupation="clown", tricks=c("juggling", "pratfalls", "mocking Bolsheviks")) reader = basicTextGatherer() curlPerform( url = "http://localhost:5984/testing123/bim", httpheader = c('Content-Type'='application/json'), customrequest = "PUT", postfields = toJSON(bim), writefunction = reader$update ) reader$value()
GET
Now that there's something in there, how do we get it back? That's super easy.
bozo2 <- fromJSON(getURL("http://localhost:5984/testing123/bozo")) bozo2 $`_id` [1] "bozo" $`_rev` [1] "1-646331b58ee010e8df39b5874b196c02" $name [1] "Bozo" $occupation [1] "clown" $shoe.size [1] 100
PUT again for updating
Updating is done by using PUT on an existing document. For example, let's give Bozo, some mad skillz:
getURL( "http://localhost:5984/testing123/bozo", customrequest="PUT", httpheader=c('Content-Type'='application/json'), postfields=toJSON(bozo2))
POST
If you POST to the database, you're adding a document and letting CouchDB assign its _id field.
bender = list( name='Bender', occupation='bending', species='robot') response <- fromJSON(getURL( 'http://localhost:5984/testing123/', customrequest='POST', httpheader=c('Content-Type'='application/json'), postfields=toJSON(bender))) response $ok [1] TRUE $id [1] "2700b1428455d2d822f855e5fc0013fb" $rev [1] "1-d6ab7a690acd3204e0839e1aac01ec7a"
DELETE
For DELETE, you pass the doc's revision number in the query string. Sorry, Bender.
response <- fromJSON(getURL("http://localhost:5984/testing123/2700b1428455d2d822f855e5fc0013fb?rev=1-d6ab7a690acd3204e0839e1aac01ec7a", customrequest="DELETE"))
CRUD with R4CouchDB
R4CouchDB provides a layer on top of the techniques we've just described.
R4CouchDB uses a slightly strange idiom. You pass a cdb object, really just a list of parameters, into every R4CouchDB call and every call returns that object again, maybe modified. Results are returned in cdb$res. Maybe, they did this because R uses pass by value. Here's how you would initialize the object.
cdb <- cdbIni() cdb$serverName <- "localhost" cdb$port <- 5984 cdb$DBName="testing123"
Create
fake.data <- list( state='WA', population=6664195, state.bird='Lady GaGa') cdb$dataList <- fake.data cdb$id <- 'fake.data' ## optional, otherwise an ID is generated cdb <- cdbAddDoc(cdb) cdb$res $ok [1] TRUE $id [1] "fake.data" $rev [1] "1-14bc025a194e310e79ac20127507185f"
Read
cdb$id <- 'bozo' cdb <- cdbGetDoc(cdb) bozo <- cdb$res bozo $`_id` [1] "bozo" ... etc.
Update
First we take the document id and rev from the existing document. Then, save our revised document back to the DB.
cdb$id <- bozo$`_id` cdb$rev <- bozo$`_rev` bozo = list( name="Bozo", occupation="assassin", shoe.size=100, skills=c( 'pranks', 'honking nose', 'kung fu', 'high explosives', 'sniper', 'lock picking', 'safe cracking')) cdb <- cdbUpdateDoc(bozo)
Delete
Shortly thereafter, Bozo mysteriously disappeared.
cdb$id = bozo$`_id` cdb <- cdbDeleteDoc(cdb)
More on ReST and CouchDB
- One issue you'll probably run into is that unfortunately JSON left out NaN and Infinity. And, of course only R knows about NAs.
- One-off ReST calls are easy using curl from the command line, as described in REST-esting with cURL.
- I flailed about quite a bit trying to figure out the best way to do HTTP with R.
- I originally thought R4CouchDB was part of a Google summer of code project to support NoSQL DBs in R. Dirk Eddelbuettel clarified that R4CouchDB was developed independently. In any case, the schema-less approach fits nicely with R's philosophy of exploratory data analysis.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.