Quantitative Finance Applications in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
by Daniel Hanson, QA Data Scientist, Revolution Analytics
Extracting Financial Data from Internet Source Using R (first in a series)
Earlier this month, a colleague and I attended a presentation on Computational Finance in R, given by Guy Yollin of the University of Washington Applied Mathematics faculty, at a meeting of the Seattle useR Group. The first among several topics was extracting financial time series from internet sources using two R packages: quantmod, and Quandl. This will be the subject of today’s blog post.
For those of us who do not have access to a Bloomberg terminal but wish to have data to experiment with financial models, these two packages are a real boon. While quantmod also contains plotting and charting functionality, we will look at its capability of downloading stock and index prices from Yahoo Finance and Google Finance. The Quandl package is a bit different in that it is tied in with the Quandl website, a source of financial data itself, as well as a portal to economic and social science data. We will focus on accessing financial futures data on Quandl.
quantmod
Many readers are likely familiar with the finance sites on Yahoo and Google as sources for tracking stock, mutual fund, and exchange traded fund (ETF) prices and returns. Using quantmod, one can easily load this data into R by specifying the the same ticker symbol that is used in these two web sources.
As with any other R package, one must install the quantmod package in the usual way. As an aside, included with the installation is also the xts package for time series data, which we will discuss briefly later on.
After installing the package, using it is quite simple and straightforward. For example, to retrieve all available data for the PowerShares QQQ Nasdaq 100 Index ETF, one enters the following R command:
getSymbols(“QQQ”)
The data set is returned to the R session in the form of an xts object with the name QQQ. To check the contents, the head(.) and tail(.) R functions are useful:
head(QQQ)
gives us
QQQ.Open QQQ.High QQQ.Low QQQ.Close QQQ.Volume QQQ.Adjusted
2007-01-03 43.46 44.06 42.52 43.24 167689500 41.27
2007-01-04 43.30 44.21 43.15 44.06 136853500 42.05
2007-01-05 43.95 43.95 43.48 43.85 138958800 41.85
2007-01-08 43.89 44.12 43.64 43.88 106401600 41.88
2007-01-09 44.01 44.29 43.63 44.10 121577500 42.09
2007-01-10 43.96 44.66 43.82 44.62 121070100 42.58
and tail(QQQ)gives us
QQQ.Open QQQ.High QQQ.Low QQQ.Close QQQ.Volume QQQ.Adjusted
2013-12-12 85.21 85.42 84.90 84.96 33630800 84.96
2013-12-13 85.36 85.40 84.71 84.85 34072000 84.85
2013-12-16 85.27 85.81 85.21 85.32 24902500 85.32
2013-12-17 85.32 85.53 84.99 85.15 24411800 85.15
2013-12-18 85.25 86.22 84.05 86.14 68270800 86.14
2013-12-19 86.00 86.01 85.52 85.90 44964100 85.90
Now, a couple of remarks about what we have done, and other options that are available. First, note that we are given daily prices. In order to retrieve, say, monthly values, one may use an xts command on the return object; xts also provides functionality to extract a subset of the data based on a range of dates. This is a discussion worth having, given that xts is widely used in financial modeling in R, but we will defer it to a later blog post. Secondly, note that we did not specify the data source; this is because the default is Yahoo.
So, as another example, and employing a little bit of irony, let’s download Yahoo’s stock price history, using Google Finance as the data source:
# use single quotes and specify data source:
getSymbols(“YHOO”, src = “google”) # src = “yahoo” is the default
We can then extract the closing prices to an R vector:
# coerce from an xts object to a standard numerical R vector:
as.vector(YHOO[, “YHOO.Close”])
Finally, we can calculate the daily log returns in one fell swoop:
log.yahoo <- log(yhoo.close[-1]/yhoo.close[-length(yhoo.close])
# Check:
head(log.yahoo)
tail(log.yahoo)
These yield the following:
> head(log.yahoo)
[1] 0.047282921 0.032609594 0.006467863 -0.012252406 0.039806250 0.017271587
> tail(log.yahoo)
[1] 0.004840157 0.009610595 0.000000000 -0.005552765 0.013325150 0.003988041
Quandl
One of the limitations of data available from Yahoo and Google, as may be noticed above, is that it only dates back to January of 2007, while fund tracking and analysis will often require a time series of 20 years or more of historical returns. Furthermore, some of the index-tracking ETF’s didn’t come into existence until the early 2000’s, and new tracking funds for emerging markets and other indices have even shorter histories. To get return data going back farther in time, it is common to look to futures markets, and Quandl provides a rich set of historical futures prices.
As with the quantmod package, it is necessary to install the Quandl package in the same way. The Quandl R package help page gives step by step instructions, followed by a first example of loading futures price data into R, which we also show here:
# Oil futures price data from the National Stock Exchange of India:
mydata = Quandl(“NSE/OIL”)
The default for the return object from Quandl is a standard R dataframe; an xts option is also available, but for this demonstration we will stick with a dataframe. Checking the data, we get:
> head(mydata)
Date Open High Low Last Close Quantity Turnover(Lacs)
1 2013-12-12 462.00 470.65 458.50 461.00 460.05 622936 2903.94
2 2013-12-11 469.10 474.50 462.20 463.60 463.25 398041 1849.26
3 2013-12-10 472.00 479.85 472.00 474.30 473.90 476841 2279.75
5 2013-12-06 461.90 468.00 460.25 462.00 462.05 207715 960.85
6 2013-12-05 473.30 475.00 461.00 461.70 461.90 313943 1462.70
> tail(mydata)
Date Open High Low Last Close Quantity Turnover(Lacs)
2009-10-08 1145.0 1179.8 1142.0 1178.10 1170.20 788173 9138.99
2009-10-07 1153.8 1160.7 1140.0 1141.45 1141.60 698216 8032.98
2009-10-06 1149.8 1157.2 1132.1 1143.30 1144.90 627957 7185.90
2009-10-05 1152.0 1165.9 1136.6 1143.00 1140.55 919832 10581.13
2009-10-01 1102.0 1173.7 1102.0 1167.00 1166.35 3074254 35463.78
2009-09-30 1096.0 1156.7 1090.0 1135.00 1141.20 19748012 223877.07
This gives us a first example, but note that the data only goes back to September of 2009. As a more realistic example, let’s look at futures prices that have a much longer record, and let’s specify that we want monthly rather than daily (the default setting) data. A good example is the price of gold futures, but how would we find the lookup symbol? There is an R search function in the Quandl package, but at least at the time of this writing, it exhibits, shall we say, some need for improvement.
We have found a more reliable approach at this stage is to start with the Futures section of the Quandl website, and then select Metals from the list that appears at left. At the top of the Metals list that is now displayed, click on Gold (Symbol GC/Exchange COMEX). A new Gold Futures page will load, with two contracts (GC1 and GC2) shown under Latest Quotes. Choose Historical Data for GC2, and a page showing a plot of returns and price data will be displayed; however, we want to get the symbol to use in our R command, and to do this, scroll all the way down to the bottom of the page where a box entitled DATASET INFORMATION is displayed, as shown here:
We can now get the lookup symbol from the end of the Permalink line, namely
OFDP/FUTURE_GC2
and pop this into an R command to load the gold futures price data:
# set collapse=”monthly” to retrieve monthly prices
gold <- Quandl("OFDP/FUTURE_GC2", collapse="monthly")
To see how far back the price history goes, we use tail(.), because the data is loaded in reverse order:
> tail(gold)
Date Open High Low Settle Volume Open Interest
463 1975-05-31 169.1 169.5 168.2 169.0 NA 2539
464 1975-04-30 169.6 169.8 168.9 168.9 NA 813
465 1975-03-31 178.2 179.7 178.0 179.5 NA 2058
466 1975-02-28 185.8 185.9 184.0 185.9 NA 504
467 1975-01-31 178.0 178.4 177.0 178.0 NA 2016
468 1974-12-31 195.4 195.4 185.7 185.7 NA 366
Although this example may seem a bit cumbersome, running through it a few times with different types of futures contracts and trying it out, the process of obtaining the lookup symbol eventually becomes somewhat routine.
Finally, suppose we don’t want to go back to 1974, but instead we want a 20 year period of monthly prices from June 1992 through May 2012. In this case, we simply put in start_date and end_date parameters as follows:
gold20 <- Quandl("OFDP/FUTURE_GC2", collapse="monthly",
start_date=”1992-06- 01″, end_date=”2012-05-01″)
This will give us the end-of-month futures price data from June 1992 through May 2012.
To calculate the log returns of monthly settled prices, we again extract this data as a vector, but remembering that the prices are in reverse chronological order, we need to reverse the order of the vector and then do the same calculation as we did for the Yahoo stock returns in the previous section:
gold.settle <- gold20[, "Settle"]
gold.settle <- rev(gold.settle)
log.gold <- log(gold.settle[-1]/gold.settle[-length(gold.settle)])
# Check:
head(log.gold)
tail(log.gold)
These yield the following:
> head(log.gold)
[1] 0.063351108 0.003235085 0.024869116 0.015285152 -0.103421591 0.109093821
> tail(log.gold)
[1] 0.00388234 0.01772572 0.02286972 -0.01180379 0.04002237 -0.03741933
In future posts, we will expand into topics of interest in quantitative finance and risk management. In the meantime, for more examples of using Quandl data in R, take a look at Joseph Rickert’s mini-tutorial on Quandl.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.