Stock Market Data Scenario Set Generation – S&P 100
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I just love to create portfolio optimization models based on Optimization theory and such models require a well-defined return scenario set which is nothing more than a matrix where we have a joint possible set of returns of all our assets under consideration. The easiest way is to use historical data for this purpose. While it is dangerous to use historical data in many price-based single asset strategies, it definitely makes sense for portfolio-based analysis because we capture the empirical dependency of the assets which works surprisingly well.
We focus on the S&P 100 here, but the code can be extended to any set of assets. Furthermore we download the data from Yahoo! Finance which optimally should also be replaced at least with e.g. Alpha Vantage.
Speaking about the code – it is also available as a GIST on GitHub which your can find under this link.
We require two libraries to replicate all the code below, i.e.
library(quantmod) library(rvest)
First we need to get the list of all components of the S&P 100. Because we are lazy we use (and web-scrape) Wikipedia for this purpose. Please note that we hard-coded the fact that the list of assets and ticker symbols is in the third table of this Wikipedia page. If the structure of the page changes this number has to be adapted. Last check: August 5th, 2019.
table_id <- 3 url <- "https://en.wikipedia.org/wiki/S%26P_100" sp100 <- url %>% read_html() %>% html_nodes("table") %>% .[[table_id]] %>% html_table()
Next we store the parsed info (i.e. ticker symbols and company names) into two vectors:
sp100_ticker <- sp100$Symbol sp100_company <- sp100$Name
Now we are ready to download all price data from Yahoo! Finance and store all separate data frames into one large list for easier post-processing:
quantmod::getSymbols(sp100_ticker) sp100_data_complete <- list() for(current_ticker in sp100_ticker) { sp100_data_complete[[length(sp100_data_complete) + 1]] <- get(current_ticker) }
Next we check whether enough data for all stocks is available and if not we drop the stocks. In this case we require at least 1500 days of data availability.
rows <- sapply(sp100_data_complete, function(x) nrow(x)) nrow.cutoff <- 1500 pos <- which(nrows < nrow.cutoff) sp100_ticker[pos] sp100_data_complete <- sp100_data_complete[-pos] sp100_company <- sp100_company[-pos] sp100_ticker <- sp100_ticker[-pos]
Now we are ready to compute returns. To do so we start by extracting the Adjusted Closing Prices only from the OHLC/VA data we have.
current_pos <- 1 sp100_adjusted <- Ad(sp100_data_complete[[current_pos]]) for(current_pos in 2:length(sp100_ticker)) { sp100_adjusted <- merge(sp100_adjusted, Ad(sp100_data_complete[[current_pos]])) } names(sp100_adjusted) <- sp100_ticker
Then we may select a certain time frame and compute daily returns from this set of Adjusted Closing Prices:
timeframe <- "2013/2019" sp100_returns <- dailyReturn(sp100_adjusted[,1])[timeframe] for(current_pos in 2:length(sp100_ticker)) { sp100_returns <- merge(sp100_returns, dailyReturn(sp100_adjusted[,current_pos])[timeframe]) } names(sp100_returns) <- sp100_ticker
Now we are done! The full code contains one more (almost unnecessary) cleanup, but in any way we may now store our data for subsequent use.
scenario.set <- sp100_returns save(scenario.set, sp100_ticker, sp100_company, file="oex-returns-1908.rda")
That’s it! Now you have a perfect scenario set to (back)test any portfolio optimization methods based on scenario input. Enjoy!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.