Monte Carlo Part Two
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In a previous post, we reviewed how to set up and run a Monte Carlo (MC) simulation of future portfolio returns and growth of a dollar. Today, we will run that simulation many, many, times and then visualize the results.
Our ultimate goal is to build a Shiny app that allows an end user to build a custom portfolio, simulate returns, and visualize the results. If you just can’t wait, a link to the final Shiny app is available here.
This post builds off the work we did previously. I won’t go through the logic again, but the code for building a portfolio, calculating returns, mean and standard deviation of returns, and using them for a simulation is here:
library(tidyquant) library(tidyverse) library(timetk) library(broom) library(highcharter) symbols <- c("SPY","EFA", "IJS", "EEM","AGG") prices <- getSymbols(symbols, src = 'yahoo', from = "2012-12-31", to = "2017-12-31", auto.assign = TRUE, warnings = FALSE) %>% map(~Ad(get(.))) %>% reduce(merge) %>% `colnames<-`(symbols) w <- c(0.25, 0.25, 0.20, 0.20, 0.10) asset_returns_long <- prices %>% to.monthly(indexAt = "lastof", OHLC = FALSE) %>% tk_tbl(preserve_index = TRUE, rename_index = "date") %>% gather(asset, returns, -date) %>% group_by(asset) %>% mutate(returns = (log(returns) - log(lag(returns)))) %>% na.omit() portfolio_returns_tq_rebalanced_monthly <- asset_returns_long %>% tq_portfolio(assets_col = asset, returns_col = returns, weights = w, col_rename = "returns", rebalance_on = "months") mean_port_return <- mean(portfolio_returns_tq_rebalanced_monthly$returns) stddev_port_return <- sd(portfolio_returns_tq_rebalanced_monthly$returns) simulation_accum_1 <- function(init_value, N, mean, stdev) { tibble(c(init_value, 1 + rnorm(N, mean, stdev))) %>% `colnames<-`("returns") %>% mutate(growth = accumulate(returns, function(x, y) x * y)) %>% select(growth) }
That code allows us to run one simulation of the growth of a dollar over the next 10 years, with the simulation_accum_1()
that we built for that purpose. Today, we will review how to run 51 simulations, though we could choose any number (and our Shiny application allows an end user to do just that).
First, we need an empty matrix with 51 columns, an initial value of $1, and intuitive column names.
We will use the rep()
function to create 51 columns with a 1 as the value, and set_names()
to name each column with the appropriate simulation number.
sims <- 51 starts <- rep(1, sims) %>% set_names(paste("sim", 1:sims, sep = ""))
Take a peek at starts
to see what we just created, and how it can house our simulations.
head(starts) sim1 sim2 sim3 sim4 sim5 sim6 1 1 1 1 1 1 tail(starts) sim46 sim47 sim48 sim49 sim50 sim51 1 1 1 1 1 1
51 columns, with a value of 1 in one row. This is where we will store the results of the 51 simulations.
Now, we want to apply simulation_accum_1
to each of the 51 columns of the starts
matrix, and we will do that using the map_dfc()
function from the purrr
package.
map_dfc()
takes a vector – in this case, the columns of starts
– and applies a function to it. By appending dfc()
to the map_
function, we are asking the function to store each of its results as the column of a data frame (map_df()
does the same thing, but stores results in the rows of a data frame). After running the code below, we will have a data frame with 51 columns, one for each of our simulations.
We also need to choose how many months to simulate (the N argument to our simulation function) and supply the distribution parameters as we did before. We do not supply the init_value
argument because the init_value
is 1, that same 1 that is already in the 51 columns.
monte_carlo_sim_51 <- map_dfc(starts, simulation_accum_1, N = 120, mean = mean_port_return, stdev = stddev_port_return) tail(monte_carlo_sim_51 %>% select(growth1, growth2, growth49, growth50), 3) # A tibble: 3 x 4 growth1 growth2 growth49 growth50 <dbl> <dbl> <dbl> <dbl> 1 1.81 1.38 2.32 1.80 2 1.84 1.37 2.38 1.84 3 1.82 1.33 2.39 1.82
Have a look at the results. We now have 51 simulations of the growth of a dollar, and we simulated that growth over 120 months; however, the results are missing a piece that we need for visualization, namely a month
column.
Let’s add that month
column with mutate()
and give it the same number of rows as our data frame. These are months out into the future. We will use mutate(month = seq(1:nrow(.)))
, and then clean up the column names. nrow()
is equal to the number of rows in our object. If we were to change to 130 simulations, that would generate 130 rows, and nrow()
would be equal to 130.
monte_carlo_sim_51 <- monte_carlo_sim_51 %>% mutate(month = seq(1:nrow(.))) %>% select(month, everything()) %>% `colnames<-`(c("month", names(starts))) %>% mutate_all(funs(round(., 2))) tail(monte_carlo_sim_51 %>% select(month, sim1, sim2, sim49, sim50), 3) # A tibble: 3 x 5 month sim1 sim2 sim49 sim50 <dbl> <dbl> <dbl> <dbl> <dbl> 1 119 2.16 1.81 1.46 2.32 2 120 2.28 1.84 1.46 2.38 3 121 2.26 1.82 1.46 2.39
We have accomplished our goal of running 51 simulations, and could head to data visualization now, but let’s explore an alternative method using the the rerun()
function from purrr
. As its name implies, this function will “rerun” another function, and we stipulate how many times to do that by setting .n = number of times to rerun
. For example to run the simulation_accum_1
function 5 times, we would set the following:
monte_carlo_rerun_5 <- rerun(.n = 5, simulation_accum_1(1, 120, mean_port_return, stddev_port_return))
That returned a list of 5 data frames, or 5 simulations. We can look at the first few rows of each data frame by using map(..., head)
.
map(monte_carlo_rerun_5, head) [[1]] # A tibble: 6 x 1 growth <dbl> 1 1 2 0.983 3 0.965 4 0.946 5 0.967 6 0.962 [[2]] # A tibble: 6 x 1 growth <dbl> 1 1 2 0.980 3 0.975 4 0.964 5 0.969 6 0.914 [[3]] # A tibble: 6 x 1 growth <dbl> 1 1 2 1.03 3 0.997 4 0.979 5 1.04 6 1.04 [[4]] # A tibble: 6 x 1 growth <dbl> 1 1 2 0.974 3 0.962 4 0.943 5 0.942 6 0.963 [[5]] # A tibble: 6 x 1 growth <dbl> 1 1 2 0.990 3 1.02 4 1.06 5 1.12 6 1.13
Let’s consolidate that list of data frames to one tibble
. We start by collapsing to vectors with simplify_all()
, then add nicer names with the names()
function and finally coerce to tibble with as_tibble()
. Let’s run it 51 times to match our previous results.
reruns <- 51 monte_carlo_rerun_51 <- rerun(.n = reruns, simulation_accum_1(1, 120, mean_port_return, stddev_port_return)) %>% simplify_all() %>% `names<-`(paste("sim", 1:reruns, sep = " ")) %>% as_tibble() %>% mutate(month = seq(1:nrow(.))) %>% select(month, everything()) tail(monte_carlo_rerun_51 %>% select(`sim 1`, `sim 2`, `sim 49`, `sim 50`), 3) # A tibble: 3 x 4 `sim 1` `sim 2` `sim 49` `sim 50` <dbl> <dbl> <dbl> <dbl> 1 1.99 1.97 3.66 2.20 2 2.00 1.86 3.68 2.25 3 2.02 1.95 3.78 2.18
Now we have two objects holding the results of 51 simulations, monte_carlo_rerun_51
and monte_carlo_sim_51
.
Each has 51 columns of simulations and 1 column of months. Note that we have 121 rows because we started with an initial value of 1, and then simulated returns over 120 months.
Now let’s get to visualization with ggplot()
and visualize the results in monte_carlo_sim_51
. The same code flows for visualization would also apply to monte_carlo_rerun_51
, but we will run them for only monte_carlo_sim_51
here.
We start with a chart of all 51 simulations, and assign a different color to each one by setting ggplot(aes(x = month, y = growth, color = sim))
. ggplot()
will automatically generate a legend for all 51 time series, but that gets quite crowded. We will suppress the legend with theme(legend.position = "none")
.
monte_carlo_sim_51 %>% gather(sim, growth, -month) %>% group_by(sim) %>% ggplot(aes(x = month, y = growth, color = sim)) + geom_line() + theme(legend.position="none")
We can check the minimum, maximum and median simulation with the summarise()
function here.
sim_summary <- monte_carlo_sim_51 %>% gather(sim, growth, -month) %>% group_by(sim) %>% summarise(final = last(growth)) %>% summarise( max = max(final), min = min(final), median = median(final)) sim_summary # A tibble: 1 x 3 max min median <dbl> <dbl> <dbl> 1 4.81 1.31 2.3
We can clean up our original visualization by including only the max, min and median that were just calculated.
monte_carlo_sim_51 %>% gather(sim, growth, -month) %>% group_by(sim) %>% filter( last(growth) == sim_summary$max || last(growth) == sim_summary$median || last(growth) == sim_summary$min) %>% ggplot(aes(x = month, y = growth)) + geom_line(aes(color = sim))
Now let’s port our results over to highcharter
, but in a major departure from our usual code flow, we will pass a tidy tibble
instead of an xts
object.
Our first step is to convert the data from wide to long tidy format with the gather()
function.
mc_gathered <- monte_carlo_sim_51 %>% gather(sim, growth, -month) %>% group_by(sim) head(mc_gathered) # A tibble: 6 x 3 # Groups: sim [1] month sim growth <dbl> <chr> <dbl> 1 1 sim1 1 2 2 sim1 0.99 3 3 sim1 1.01 4 4 sim1 1.06 5 5 sim1 1.08 6 6 sim1 1.1
We can now pass this tibble
directly to the hchart()
function, specify the type of chart as line
, and then work with a similar grammar to ggplot()
. The difference is we use hcaes
, which stands for highcharter aesthetic
, instead of aes
.
# This takes a few seconds to run hchart(mc_gathered, type = 'line', hcaes(y = growth, x = month, group = sim)) %>% hc_title(text = "51 Simulations") %>% hc_xAxis(title = list(text = "months")) %>% hc_yAxis(title = list(text = "dollar growth"), labels = list(format = "${value}")) %>% hc_add_theme(hc_theme_flat()) %>% hc_exporting(enabled = TRUE) %>% hc_legend(enabled = FALSE)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.