Site icon R-bloggers

FOMC Dates – Scraping Data From Web Pages

[This article was first published on Return and Risk, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Before we can do some quant analysis, we need to get some relevant data – and the web is a good place to start. Sometimes the data can be downloaded in a standard format like .csv files or available via an API e.g. http://www.quandl.com but often you’ll need to scrape data directly from web pages.

In this post I’ll show how to obtain the US Federal Reserve FOMC Announcement dates (i.e. those when a statement is published after the meeting) from their web page http://www.federalreserve.gov/monetarypolicy/fomccalendars.htm. At the time of writing, this web page had dates from 2009 onward.

First, install and load the httr and XML R packages.

install.packages(c("httr", "XML"), repos = "http://cran.us.r-project.org")
library(httr)
library(XML)

Next, run the following R code.

# get and parse web page content
webpage <- content(GET(
    "http://www.federalreserve.gov/monetarypolicy/fomccalendars.htm"), 
    as = "text")
xhtmldoc <- htmlParse(webpage)
# get statement urls and sort them
statements <- xpathSApply(xhtmldoc, "//td[@class='statement2']/a", xmlGetAttr,
    "href")
statements <- sort(statements)
# get dates from statement urls
fomcdates <- sapply(statements, function(x) substr(x, 28, 35))
fomcdates <- as.Date(fomcdates, format = "%Y%m%d")
# save results in working directory
save(list = c("statements", "fomcdates"), file = "fomcdates.RData")

Finally, check the results by looking at their structures and first few values.

# check data
str(statements)
head(statements)
str(fomcdates)
head(fomcdates)

And you should see output similar to this below.

##  chr [1:49] "/newsevents/press/monetary/20090128a.htm" ...
## [1] "/newsevents/press/monetary/20090128a.htm"
## [2] "/newsevents/press/monetary/20090318a.htm"
## [3] "/newsevents/press/monetary/20090429a.htm"
## [4] "/newsevents/press/monetary/20090624a.htm"
## [5] "/newsevents/press/monetary/20090812a.htm"
## [6] "/newsevents/press/monetary/20090923a.htm"
##  Date[1:49], format: "2009-01-28" "2009-03-18" "2009-04-29" "2009-06-24" ...
## [1] "2009-01-28" "2009-03-18" "2009-04-29" "2009-06-24" "2009-08-12"
## [6] "2009-09-23"

So what can we do with this data? Here are a few ideas:

Click here for the R code on GitHub.

To leave a comment for the author, please follow the link and comment on their blog: Return and Risk.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.