Using Quandl in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Image by Jan Zander
Our mantra here at Quandl is making data easy to find and easy to use. Following that goal we (and subsequently the community) have created packages that integrate Quandl’s API into a number of software platforms. Today we’ll take a look at R.
R is a free statistical computing language created in 1993 based on an implementation of the S computing language. It has many packages written by its community which keep its methods on the cutting edge of statistical analysis.
Quandl’s API wrapped within R makes the tedious aspects of getting data into your console trivial, and gets you doing the real work faster. I’ll give a few examples of how Quandl’s data can be used in R, and seeing how YAHOO has just acquired Tumbler, it seems fitting to use their stock price as our base data.
To access this data we need to know its Quandl code. In this case it is “GOOG/NASDAQ_YHOO”. An explanation of our codes can be found in Quandl nomenclature.
Make a complicated plot
R is well known for its graphing capabilities. Quandl has recently released the ability to quickly embed a graph of our data on any webpage (as you can see above), but if you need a graph tailored to your precise analytic needs, R can handle that easily.
Using a graphing package such as ggplot2 you can create custom graphs very easily. With a few lines of code you can take Quandl data and turn it into this:
Decompose a time series
Because of the large number of contributors to R, it contains numerous time series formats and they each interact with different packages. The Quandl package returns data in a number of them. In this example I return the data in the native R time series format and pass it to a function to decompose it into its seasonal and trend components.
data <- Quandl("GOOG/NASDAQ_YHOO", collapse="monthly", type="ts") plot(stl(data[,4],s.window="periodic"), main = "YHOO Decomposition")
Calculate Trends
The zoo time series format handles irregularly spaced time series – like daily stock prices. Returning data in this format allows for the easy calculation of things that require consideration of the date, like a 200 day moving average or volatility.
data <- Quandl("GOOG/NASDAQ_YHOO",type="zoo") rolling_average <- rollapply(data[,4],200,mean) rolling_volatility <- rollapply(data[,4],200,sd)
Calculate Financial Indicators
Using a format like zoo also makes it easy to match two time series along their dates. This makes calculations involving multiple prices over different time periods much easier. We can use this to easily find the beta of gold prices from Bundesbank and Brent crude oil prices from the U.S. Department of energy.
The calculation of beta takes a couple of steps.
- Load gold prices and Brent crude oil prices
- Convert the values to daily returns
- Match price returns along the dates
- Perform Regression
Quandl takes care of the first two steps for you. First I’ll load the datasets into R, and apply the “rdiff” transformation (their percent returns) in the same step.
gold <- Quandl("BUNDESBANK/BBK01_WT5511", type="zoo", transformation="rdiff") oil <- Quandl("DOE/RBRTE", type="zoo", transformation="rdiff")
Then I can use the zoo function “merge” to match up the two stocks along the dates.
beta_data <- merge(gold, oil)
It is worthwhile noting that you can also combine those three lines of code into one using Quandl Supersets. Now all that’s left is to regress gold against oil.
lm(coredata(beta_data[,1]) ~ coredata(beta_data[,2]), na.action=na.omit)$coef[2] > 0.0515
These are just some very basic things I did in R with one dataset. What can you do with over 6 million?
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.