Turning Data Into Awesome With sqldf and pandasql
[This article was first published on Mathew Analytics » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Both R and Python possess libraries for using SQL statements to interact with data frames. While both languages have native facilities for manipulating data, the sqldf and pandasql provide a simple and elegant interface for conducting tasks using an intuitive framework that’s widely used by analysts.
R and sqldf
sqldf("SELECT COUNT(*) FROM df2 WHERE state = 'CA'") COUNT(*) 1 4 sqldf("SELECT df2.firstname, df2.lastname, df1.var1, df2.state FROM df1 INNER JOIN df2 ON df1.personid = df2.id WHERE df2.state = 'TX'") firstname lastname var1 state 1 David Spade -2.09 TX 2 Joe Montana 1.16 TX sqldf("SELECT df2.state, COUNT(df1.var1) FROM df1 INNER JOIN df2 ON df1.personid = df2.id WHERE df1.var1 > 0 GROUP BY df2.state") state COUNT(df1.var1) 1 AZ 1 2 CA 1 3 GA 1 4 IL 1 5 NC 1 6 NY 1 7 OK 1 8 SC 1 9 TX 1 10 VT 1
Python and pandasql
import pandasql as ps q1 = """SELECT COUNT(*) FROM df2 WHERE state = 'CA'""" print ps.sqldf(q1, locals()) COUNT(*) 0 4 q2 = """ SELECT df2.firstname, df2.lastname, df1.var1, df2.state FROM df1 INNER JOIN df2 ON df1.personid = df2.id WHERE df2.state = "TX"; """ print ps.sqldf(q2, locals()) firstname lastname var1 state 0 David Spade -2.09 TX 1 Joe Montana 1.16 TX q3 = """SELECT df2.state, COUNT(df1.var1) FROM df1 INNER JOIN df2 ON df1.personid = df2.id WHERE df1.var1 > 0 GROUP BY df2.state""" print ps.sqldf(q3, locals()) state COUNT(df1.var1) 0 AZ 1 1 CA 1 2 GA 1 3 IL 1 4 NC 1 5 NY 1 6 OK 1 7 SC 1 8 TX 1 9 VT 1
To leave a comment for the author, please follow the link and comment on their blog: Mathew Analytics » R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.