Analyse Quandl data with R – even from the cloud
[This article was first published on rapporter, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I have read two thrilling news about the really promising time-series data provider called Quandl recently: Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
With the help of the Quandl R package* (development version is hosted on GitHub), it is really easy to fetch a variety of time-series directly from R – so no need even to deal with the standard file formats that the data provider currently offers (csv, XML, JSON) or to manually trigger the otherwise awesome API. The
Quandl
function can automatically “identify” (or to be more precise: parse from the provided metadata) the frequency of the time-series, and other valuable information can be also fetched with some further hacks. I will try to show a few in this post.The plethora of available data at Quandl and the endless possibilities for statistical analysis provided by R made us work on a robust time-series reporting module, or so called template, that can be applied to hopefully any data sequence found on the site.
Our main intention was to also support supersets by default. This feature is a great way of combining separate time-series with a few clicks, now we try to provide a simple way to analyse those e.g. with computing the bivariate cross-correlation between those with different time-lags, and also to let users click on each variable for detailed univariate statistics with a calendar heatmap, seasonal decomposition or automatically identified best ARIMA models among others.
This may not seem sensational for the native R guys as the community has already developed awesome R packages for these tasks to be found on CRAN, GitHub, R-forge etc. But please bear in mind that we present a template here, a module which is a compilation of these functions along with some dynamic annotations (also know as: literate programming) to be run against any time-series data – on your local computer or on the cloud. Long story short:
Our main intention was to also support supersets by default. This feature is a great way of combining separate time-series with a few clicks, now we try to provide a simple way to analyse those e.g. with computing the bivariate cross-correlation between those with different time-lags, and also to let users click on each variable for detailed univariate statistics with a calendar heatmap, seasonal decomposition or automatically identified best ARIMA models among others.
This may not seem sensational for the native R guys as the community has already developed awesome R packages for these tasks to be found on CRAN, GitHub, R-forge etc. But please bear in mind that we present a template here, a module which is a compilation of these functions along with some dynamic annotations (also know as: literate programming) to be run against any time-series data – on your local computer or on the cloud. Long story short:
What we do in this template?
- Downloading data from Quandle with given params [L20] and
- drawing some commentary about the meta-data found in the JSON structure [L27].
- As we are not using Quandl’s R package to interact with their servers to be able to also use the provided meta-data, first we have to transform the data to a
data.frame
[L34] and also identify the potential number of variables to be analysed [at the end of L64] to choose from: - multivariate statistics:
- overview of data as a line plot [L74-78],
- cross-correlation for each pairs with additional line plot [L95-L110],
- and a short text about the results [L112].
- univariate statistics:
- descriptive statistics of the data in a table [L122] and also in text [L129 and L136],
- a histogram [L133] with
base::hist
(grid and all other style elements are automatically added with thepander
package), - a line plot based on an automatically transformed
ts
object [L153-162] for which the frequency was identified by the original meta-data, - a calendar heatmap [L172-178] only for daily data,
- autocorrelation [L199-L212],
- seasonal decomposition only for non-annual data with enough cases [L225-L239],
- a dummy linear model on year and optionally month, day of month and day of week [L259-L274]
- with detailed global validation of assumptions based on
gvlma
[L275-L329] - also with check for linearity [L335] and residuals [L368],
- computed predicted values based on the linear model [L384-L390],
- and best fit ARIMA models for datasets with only few cases [L403].
- with references.
We would love to hear your feedback or about an updated version of the file!
Run locally
The template can be run inside of Rapporter for any user or in any local R session after loading our rapport R package. Just download the template and run:library(rapport) rapport('quandl.tpl')
Or apply the template to some custom data (Tammer’s Oil, gold and stocks superset):
rapport('quandl.tpl', provider = 'USER_YY', dataset = 'ZH')
And even filter the results by date for only one variable of the above:
rapport('quandl.tpl', provider = 'USER_YY', dataset = 'ZH', from = '2012-01-01', to = '2012-06-31', variable = 'Oil')
And why not check the results on a HTML page instead of the R console?
rapport.html('quandl.tpl', provider = 'USER_YY', dataset = 'ZH', from = '2012-01-01', to = '2012-06-31', variable = 'Oil')
Run in the cloud
We had introduced Rapplications a few weeks ago, so that potentially all of our (your) templates can be run by anyone with a single Internet connection at our computational expense – even without registration and authentication.We have also uploaded this template to rapporter.net and made a Rapplication for the template. Please find the following links that would bring up some real-time generated and/or partially cached reports based on the above example with GET params:
- S&P 500 Stock Index price without any filters
- Tammer’s superset about oil, gold and S&P 500 Stock Index price (multivariate)
- Oil in Tammer’s superset between 2012-01-01 and 2012-06-31
https://rapporter.net/api/rapplicate/?token=78d7432cba100b39818d0d2821c550e46a2745bf8b6dc6793f40c8c1f8e7439a&provider=USER_YY&dataset=ZH&variable=Oil&from=2012-01-01&to=2012-06-31&output_format=html&new_tab=true
With the following parameters:
- token: the identifier of the Rapplication that stores the HTML/LaTeX/docx/odt stylesheet or reference document to apply to the report. Please use the above referenced
token
or create an own Rapplication. - provider: the Quandl internal Code (ID) for the data provider
- dataset: the Quandl internal Code (ID) for the dataset
- variable (optional): a name of the variable from the dataset to analyse with univariate methods
- from and to (optional): filter by date in
YYYY-MM-DD
format - output_format (optional): the output format of the report from
html
,pdf
,docx
orodt
. Defaults tohtml
, so you might really ignore this. - new_tab (optional): set this to
true
not to force the HTML file to be downloaded - ignore_cache (optional): set this to
true
if you want to force to generate the report from scratch even we have it in the cache
Run from a widget
Of course we are aware of the fact that most R users would rather type in some commands in the R console instead of building a unique URL based on the above instructions, but we can definitely help you with that process as rapporter.net automatically generates a HTML form for each Rapplication even with some helperiframe
code to let you easily integrate that in your home page or blog post:And of course feel free to download the generated report as a pdf, docx or odt file for further editing (see the bottom of the left sidebar of the generated HTML page) and be sure to register for an account at rapporter.net to make and share similar statistical templates with friends and collaborators effortlessly.
*
QuandlR
would be a cool name IMHO
To leave a comment for the author, please follow the link and comment on their blog: rapporter.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.