R and MongoDB
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
MongoDB is a document-based noSQL database. Different from the relational database storing data in tables with rigid schemas, MongoDB stores data in documents with dynamic schemas. In the demonstration below, I am going to show how to extract data from a MongoDB with R.
Before starting the R session, we need to install the MongoDB in the local machine and then load the data into the database with the Python code below.
import pandas as pandas import pymongo as pymongo df = pandas.read_table('../data/csdata.txt') lst = [dict([(colname, row[i]) for i, colname in enumerate(df.columns)]) for row in df.values] for i in range(3): print lst[i] con = pymongo.Connection('localhost', port = 27017) test = con.db.test test.drop() for i in lst: test.save(i)
To the best of my knowledge, there are two R packages providing the interface with MongoDB, namely RMongo and rmongodb. While RMongo package is very straight-forward and user-friendly, it did take me a while to figure out how to specify a query with rmongodb package.
RMongo Example
library(RMongo) mg1 <- mongoDbConnect('db') print(dbShowCollections(mg1)) query <- dbGetQuery(mg1, 'test', "{'AGE': {'$lt': 10}, 'LIQ': {'$gte': 0.1}, 'IND5A': {'$ne': 1}}") data1 <- query[c('AGE', 'LIQ', 'IND5A')] summary(data1)
RMongo Output
Loading required package: rJava Loading required package: methods Loading required package: RUnit [1] "system.indexes" "test" AGE LIQ IND5A Min. :6.000 Min. :0.1000 Min. :0 1st Qu.:7.000 1st Qu.:0.1831 1st Qu.:0 Median :8.000 Median :0.2970 Median :0 Mean :7.963 Mean :0.3745 Mean :0 3rd Qu.:9.000 3rd Qu.:0.4900 3rd Qu.:0 Max. :9.000 Max. :1.0000 Max. :0
rmongodb Example
library(rmongodb) mg2 <- mongo.create() print(mongo.get.databases(mg2)) print(mongo.get.database.collections(mg2, 'db')) buf <- mongo.bson.buffer.create() mongo.bson.buffer.start.object(buf, 'AGE') mongo.bson.buffer.append(buf, '$lt', 10) mongo.bson.buffer.finish.object(buf) mongo.bson.buffer.start.object(buf, 'LIQ') mongo.bson.buffer.append(buf, '$gte', 0.1) mongo.bson.buffer.finish.object(buf) mongo.bson.buffer.start.object(buf, 'IND5A') mongo.bson.buffer.append(buf, '$ne', 1) mongo.bson.buffer.finish.object(buf) query <- mongo.bson.from.buffer(buf) cur <- mongo.find(mg2, 'db.test', query = query) age <- liq <- ind5a <- NULL while (mongo.cursor.next(cur)) { value <- mongo.cursor.value(cur) age <- rbind(age, mongo.bson.value(value, 'AGE')) liq <- rbind(liq, mongo.bson.value(value, 'LIQ')) ind5a <- rbind(ind5a, mongo.bson.value(value, 'IND5A')) } mongo.destroy(mg2) data2 <- data.frame(AGE = age, LIQ = liq, IND5A = ind5a) summary(data2)
rmongo Output
rmongodb package (mongo-r-driver) loaded Use 'help("mongo")' to get started. [1] "db" [1] "db.test" [1] TRUE [1] TRUE [1] TRUE [1] TRUE [1] TRUE [1] TRUE [1] TRUE [1] TRUE [1] TRUE NULL AGE LIQ IND5A Min. :6.000 Min. :0.1000 Min. :0 1st Qu.:7.000 1st Qu.:0.1831 1st Qu.:0 Median :8.000 Median :0.2970 Median :0 Mean :7.963 Mean :0.3745 Mean :0 3rd Qu.:9.000 3rd Qu.:0.4900 3rd Qu.:0 Max. :9.000 Max. :1.0000 Max. :0
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.