Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This is the first post of a series that describes how to download and parse specific data sets into R. These kinds of scripts can be functionalized further, but I doubt that these will ever find their way into a formal package. They are intended to be helpful to those facing similar tasks, but as demonstration scripts they will not be supported.
This is a script for downloading and parsing a monthly total return series of the FTSE NAREIT U.S. Real Estate Index. The spreadsheet can be downloaded from their website manually, but as you will see later, we don’t have to. Download the spreadsheet if you want to look at how it is structured.
This script can be found in the /inst/parser
directory of FinancialInstrument
on R-Forge.
After I parse the data, I want to store it and make it available for other R sessions to use. To do that, I’m going to use FinancialInstrument to register the location of the data so that I can retrieve it with getSymbols
, much like I might do to retrieve data from Yahoo.
Let’s take a look at the script…
First, we load the required packages:
require(xts) require(gdata) require(FinancialInstrument) require(quantmod)
Warning: this requires the quantmod version to be a development build after revision 560 or so.
Next, I’ll set the storage directory, where I’ll store the data and access it through
getSymbols
.filesroot = "~/Data/NAREIT"
Create and set the storage directory if it doesn’t exist.
if (!file.exists(filesroot)) dir.create(filesroot, mode="0777") if (!file.exists(paste(filesroot, "/NAREIT.IDX", sep=""))) dir.create(paste(filesroot, "/NAREIT.IDX", sep=""), mode="0777")
The
read.xls
function from the package gdata
will download the xls workbook directly from the web site.x = read.xls("http://returns.reit.com/returns/MonthlyHistoricalReturns.xls", pattern="Date", sheet="Data")
Let’s take a look at what we retrieved:
> head(x) Date Return Index Return.1 Index.1 Return.2 Yield X Return.3 Index.2 Return.4 1 Dec-71 NA 100.00 NA 100.00 NA NA NA NA 100.00 NA 2 Jan-72 1.22 101.22 0.33 100.33 0.89 6.51 NA 1.22 101.22 0.33 3 Feb-72 0.95 102.18 0.92 101.25 0.03 6.39 NA 0.95 102.18 0.92 4 Mar-72 0.25 102.44 -0.44 100.81 0.69 6.32 NA 0.25 102.44 -0.44 5 Apr-72 0.25 102.70 -0.39 100.41 0.65 6.52 NA 0.25 102.70 -0.39 6 May-72 -1.51 101.15 -1.78 98.63 0.27 7.10 NA -1.51 101.15 -1.78 Index.3 Return.5 Yield.1 X.1 Return.6 Index.4 Return.7 Index.5 Return.8 Yield.2 X.2 1 100.00 NA NA NA NA NA NA NA NA NA NA 2 100.33 0.89 6.51 NA NA NA NA NA NA NA NA 3 101.25 0.03 6.39 NA NA NA NA NA NA NA NA 4 100.81 0.69 6.32 NA NA NA NA NA NA NA NA 5 100.41 0.65 6.52 NA NA NA NA NA NA NA NA 6 98.63 0.27 7.10 NA NA NA NA NA NA NA NA Return.9 Index.6 Return.10 Index.7 Return.11 Yield.3 X.3 Return.12 Index.8 Return.13 1 NA 100.00 NA 100.00 NA NA NA NA 100.00 NA 2 0.00 100.00 -1.35 98.65 1.35 6.13 NA 0.00 100.00 -1.35 3 1.74 101.74 1.74 100.37 0.00 5.74 NA 1.74 101.74 1.74 4 -0.32 101.42 -0.74 99.63 0.42 5.92 NA -0.32 101.42 -0.74 5 3.84 105.31 3.24 102.85 0.60 5.72 NA 3.84 105.31 3.24 6 -9.37 95.44 -9.75 92.83 0.38 6.53 NA -9.37 95.44 -9.75 Index.9 Return.14 Yield.4 X.4 Return.15 Index.10 Return.16 Index.11 Return.17 Yield.5 1 100.00 NA NA NA NA 100.00 NA 100.00 NA NA 2 98.65 1.35 6.13 NA 1.24 101.24 0.87 100.87 0.38 6.10 3 100.37 0.00 5.74 NA 0.52 101.76 0.52 101.39 0.00 6.07 4 99.63 0.42 5.92 NA -1.59 100.15 -2.74 98.61 1.16 6.02 5 102.85 0.60 5.72 NA 0.37 100.52 -0.10 98.51 0.47 6.45 6 92.83 0.38 6.53 NA 1.36 101.88 1.22 99.71 0.14 7.49 X.5 1 NA 2 NA 3 NA 4 NA 5 NA 6 NA
I’ll focus on the first three columns: the date, the total return, and the total return index. The next columns divide it into price, income and dividend yield components, and after that are the individual sectors. If you are interested, it would be trivial to loop through those additional columns and create symbols for the rest of the columns in the spreadsheet. But for now, I just want to parse the total return index.
First, I’m going to parse the dates. In the spreadsheet, these are formatted as month and year (strptime
format is “%b-%y”), so they are not associated with a specific day. I want to transform these to reflect the last day of the month so that they are easier to align with other data I might use in a subsequent analysis.
To do that, I first use as.yearmon
to parse the dates from “%b-%y”. Then I can add the last day of the month by transforming that using as.Date
with frac=1
. It’s worth knowing that a value of zero would select the first day of the month – that comes in handy every once in a while.
x.dates = as.Date(as.yearmon(x[,1], format="%b-%y"), frac=1)
I then transform the returns from whole numbers to percentages.
x.returns = xts(x[,2]/100, order.by = x.dates)
I want to capture the price index as well, but I need to get rid of the commas in the string so that they are recognized as numbers.
x.price = as.numeric((sub(",","", x[,3], fixed=TRUE))) x.price = xts(x.price, order.by = x.dates)
Now I combine the data into a single object…
x.xts = cbind(x.price, x.returns) colnames(x.xts) = c("Close", "Returns")
… and save it into an rda file on the filesystem.
save(x.xts, file=paste(filesroot,"NAREIT.IDX/NAREIT.IDX.rda", sep="/"))
Next, I want to add some metadata to the data, so that I remember its attributes, such as the currency it is denominated in (actually, that’s the only required one).
Create the currency the index is demoninated in, if it hasn’t been done already (FinancialInstrument
does create some major currencies when loaded, but it doesn’t hurt to do it again).
currency("USD")
Now I can describe the metadata for the index, using
instrument
.instrument("NAREIT.IDX", currency="USD", multiplier=1, tick_size=.01, start_date="1971-12-31", description="FTSE NAREIT U.S. Real Estate Index", data="CR", source="reit.com", assign_i=TRUE)
Now, whenever you start a new R session you need to register the instruments. This might be a line you put into .Rprofile ( along with the packages and the filesroot definition) so that it happens automatically:
setSymbolLookup.FI(base_dir="~/Data/NAREIT", split_method='common')
Once you’ve done that you should be able to read that data from your local storage using
quantmod
‘s getSymbols
:getSymbols("NAREIT.IDX") chartSeries(Cl(NAREIT.IDX), theme="white") head(NAREIT.IDX)
That gives output that looks something like this:
Close Returns 1971-12-31 100.00 NA 1972-01-31 101.22 0.0122 1972-02-29 102.18 0.0095 1972-03-31 102.44 0.0025 1972-04-30 102.70 0.0025 1972-05-31 101.15 -0.0151
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.