Turning Data Into Awesome With sqldf and pandasql
[This article was first published on Mathew Analytics » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Both R and Python possess libraries for using SQL statements to interact with data frames. While both languages have native facilities for manipulating data, the sqldf and pandasql provide a simple and elegant interface for conducting tasks using an intuitive framework that’s widely used by analysts.
R and sqldf
sqldf("SELECT COUNT(*) FROM df2 WHERE state = 'CA'")
COUNT(*)
1 4
sqldf("SELECT
df2.firstname,
df2.lastname,
df1.var1,
df2.state
FROM df1
INNER JOIN df2 ON df1.personid = df2.id
WHERE df2.state = 'TX'")
firstname lastname var1 state
1 David Spade -2.09 TX
2 Joe Montana 1.16 TX
sqldf("SELECT
df2.state,
COUNT(df1.var1)
FROM df1
INNER JOIN df2 ON df1.personid = df2.id WHERE df1.var1 > 0
GROUP BY df2.state")
state COUNT(df1.var1)
1 AZ 1
2 CA 1
3 GA 1
4 IL 1
5 NC 1
6 NY 1
7 OK 1
8 SC 1
9 TX 1
10 VT 1
Python and pandasql
import pandasql as ps
q1 = """SELECT COUNT(*) FROM df2 WHERE state = 'CA'"""
print ps.sqldf(q1, locals())
COUNT(*)
0 4
q2 = """
SELECT
df2.firstname,
df2.lastname,
df1.var1,
df2.state
FROM df1 INNER JOIN df2 ON df1.personid = df2.id
WHERE df2.state = "TX";
"""
print ps.sqldf(q2, locals())
firstname lastname var1 state
0 David Spade -2.09 TX
1 Joe Montana 1.16 TX
q3 = """SELECT
df2.state,
COUNT(df1.var1)
FROM df1
INNER JOIN df2 ON df1.personid = df2.id WHERE df1.var1 > 0
GROUP BY df2.state"""
print ps.sqldf(q3, locals())
state COUNT(df1.var1)
0 AZ 1
1 CA 1
2 GA 1
3 IL 1
4 NC 1
5 NY 1
6 OK 1
7 SC 1
8 TX 1
9 VT 1
To leave a comment for the author, please follow the link and comment on their blog: Mathew Analytics » R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

