Query Pandas DataFrame with SQL
[This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Similar to SQLDF package providing a seamless interface between SQL statement and R data.frame, PANDASQL allows python users to use SQL querying Pandas DataFrames.
Below are some examples showing how to use PANDASQL to do SELECT / AGGREGATE / JOIN operations. More information is also available on the GitHub (https://github.com/yhat/pandasql).
In [1]: import sas7bdat as sas In [2]: import pandas as pd In [3]: import pandasql as pdsql In [4]: data = sas.SAS7BDAT("accepts.sas7bdat") In [5]: df = data.toDataFrame() In [6]: pysql = lambda q: pdsql.sqldf(q, globals()) In [7]: ### SELECT ### In [8]: str1 = "select bureau_score, ltv from df where bureau_score < 600 and ltv > 100 limit 3;" In [9]: df1 = pysql(str1) In [10]: df1 Out[10]: bureau_score ltv 0 590 103 1 575 120 2 538 113 In [11]: ### AGGREGATE ### In [12]: str2 = "select ltv, min(bureau_score) as min_score, max(bureau_score) as max_score from df group by ltv order by ltv DESC;" In [13]: df2 = pysql(str2); In [14]: df2.head(3) Out[14]: ltv min_score max_score 0 176 709 709 1 168 723 723 2 167 688 688 In [15]: ### JOIN ### In [16]: str3 = "select b.*, a.bureau_score from df a inner join df2 b on a.ltv = b.ltv order by ltv DESC;" In [17]: df3 = pysql(str3) In [18]: df3.head(3) Out[18]: ltv min_score max_score bureau_score 0 176 709 709 709 1 168 723 723 723 2 167 688 688 688
To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.