Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
On at least a couple of occasions lately, I realized that I may need Python in the near future. While I have amassed some limited experience with the language over the years, I never spent the time to understand Pandas, its de-facto standard data-frame library.
Where does one start? For me its usually with the data. Simple stuff, loading, wrangling, etc. Re-writing my little R6 helper class to load future’s data looked like a perfect candidate.
There was some frustration, totally expected after years of experience with R. Some things were less intuitive, however, surprisingly pretty much nothing was straight ugly.
Here is a little example how to use the code, although one can’t do much without the data, which I can’t distribute:
import pandas as pd import instrumentdb as idb def main(): # Crate the object for the database db = idb.CsiDb() # Load the data for three elements all = db.mload_bars(["HO2", "RB2", "CL2"]) print(all['HO2'].head()) print(all['RB2'].head()) # Build an array of the closing prices for each series closes = [] for ss in all.keys(): closes.append(all[ss]['close']) # Create a single data frame using these series all_df = pd.concat(closes, join='inner', axis=1) all_df.columns = [xx.lower() for xx in all.keys()] print(all_df.tail()) # That's the only line that would work without the data. print(db.future_list()) if __name__ == "__main__": main()
The structure of the database is available from Tradelib’s source code (I am using the SQLite’s version for this test). To bootstrap (create) the database I use sqlite3.exe’s read command, to which I pass data.sqlite.sql as a parameter. To be used via the CsiDb class, the database is configured using a TOML configuration file.
flavor = "SQLite" db = "sqlite:///C:/Users/qmoron/Documents/csidata.sqlite" bars_table = "csi_bars"
Now a little rant: In the above code, I tried to create a module, instrumentdb, to keep the source code in it. This created some problems while developing the module. Apparently, once loaded, it’s pretty hard to re-load the module properly within the same REPL interpreter. From R’s perspective, where I am used to re-loading files, or even packages, as my development goes, that seemed quite an obstacle. After straggling with the issue for a while, the best I was able to come up with, is the above approach of using a full-blown “main” file to drive the execution and some tests. This is unlikely to scale (in the sense of using it in a rapid REPL prototyping) – I am open to suggestions.
The post Loading Data with Pandas appeared first on Quintuitive.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.