Using Cassandra Through R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In the last couple of years, there has been a lot of buzz around open source community. Almost every day, there are a lot of tools being open sourced. With a ton of open source tools in the market, don’t expect to have drivers built for every platform. I am a big fan of open source and the main reason is the huge community behind it.
I came across Cassandra, a No-SQL database a while ago and was very impressed. Since it was open source, I did not wait a moment to get my hands into it. Being primarily an R-user, I was happy to see R-Package to connect to Cassandra. That’s where problems began. For some reason, I could not connect to the database. After hours and hours of research on stack overflow, I ended up eventually connecting to it. Next problem was, data I queried was in a very weird format. Guess what, I turned to stack overflow. After a few hours, I gave up on it and didn’t bother for a few weeks.
One day, it hit me. Let me give it a try in Python, my second favorite language and it did the job I wanted. So, now the question was how do I replicate this in R. The answer was simple. Just write Python code in R-script voila! It solved my problem for now and hopefully someone or I can come up with a solution to rewrite the package for Cassandra.
#Supress Warnings options(warn=-1) #load reticulate library to use python Scripts library(reticulate, quietly=T) #call the table in cassandra using Python function py = py_run_string('import requests; from cassandra.cluster import Cluster; from datetime import datetime; import pandas as pd; cluster = Cluster(["192.168.1.1","192.168.1.2","192.168.1.3"]); session = cluster.connect("test"); query="select * from sample_table; "; #df=pd.DataFrame(list(session.execute(query))); df=pd.DataFrame(list(session.execute(query))); print(df); cluster.shutdown();') #exit #move the pandas dataframe to R-dataframe data = py$df
So, what the above code does is, you will be running a python script to access Cassandra using reticulate package, get the results and insert them into a pandas data frame. Next, move pandas data frame to R data frame.
More tutorials on the Reticulate package is available here.
Hope this helps.
References:
[1] https://blog.rstudio.com/tags/reticulate
[2] http://www.rforge.net/RCassandra
[3] https://cran.r-project.org/web/packages/reticulate/index.html
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.