FOMC Dates – Scraping Data From Web Pages
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Before we can do some quant analysis, we need to get some relevant data – and the web is a good place to start. Sometimes the data can be downloaded in a standard format like .csv files or available via an API e.g. http://www.quandl.com but often you’ll need to scrape data directly from web pages.
In this post I’ll show how to obtain the US Federal Reserve FOMC Announcement dates (i.e. those when a statement is published after the meeting) from their web page http://www.federalreserve.gov/monetarypolicy/fomccalendars.htm. At the time of writing, this web page had dates from 2009 onward.
First, install and load the httr and XML R packages.
install.packages(c("httr", "XML"), repos = "http://cran.us.r-project.org") library(httr) library(XML)
Next, run the following R code.
# get and parse web page content webpage <- content(GET( "http://www.federalreserve.gov/monetarypolicy/fomccalendars.htm"), as = "text") xhtmldoc <- htmlParse(webpage) # get statement urls and sort them statements <- xpathSApply(xhtmldoc, "//td[@class='statement2']/a", xmlGetAttr, "href") statements <- sort(statements) # get dates from statement urls fomcdates <- sapply(statements, function(x) substr(x, 28, 35)) fomcdates <- as.Date(fomcdates, format = "%Y%m%d") # save results in working directory save(list = c("statements", "fomcdates"), file = "fomcdates.RData")
Finally, check the results by looking at their structures and first few values.
# check data str(statements) head(statements) str(fomcdates) head(fomcdates)
And you should see output similar to this below.
## chr [1:49] "/newsevents/press/monetary/20090128a.htm" ... ## [1] "/newsevents/press/monetary/20090128a.htm" ## [2] "/newsevents/press/monetary/20090318a.htm" ## [3] "/newsevents/press/monetary/20090429a.htm" ## [4] "/newsevents/press/monetary/20090624a.htm" ## [5] "/newsevents/press/monetary/20090812a.htm" ## [6] "/newsevents/press/monetary/20090923a.htm" ## Date[1:49], format: "2009-01-28" "2009-03-18" "2009-04-29" "2009-06-24" ... ## [1] "2009-01-28" "2009-03-18" "2009-04-29" "2009-06-24" "2009-08-12" ## [6] "2009-09-23"
So what can we do with this data? Here are a few ideas:
- Go deeper and download the actual statements and use a machine learning algorithm (Natural Language Processing (NLP)) to analyze the statement e.g. positive or negative sentiment. Actually, this is quite a complex task but is something on my list of research topics in 2015…
- Collect price data e.g. Treasury yields or S&P500 and do some visual / initial exploratory analysis around the FOMC announcement dates
- Conduct an event study like the academics do to identify whether or not there are any statistically significant patterns around these dates
- Incorporate the dates into a trading or investment program and backtest to see whether there are economically significant patterns i.e. tradeable alpha opportunities
Click here for the R code on GitHub.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.