Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Summary
Bitcoin is a hot topic at the moment, and understandably so. Few other assets have the potential to move 10% to 20% in a day. It is that volatility which as attracted so many investors (and that capital inflow has likely fueled subsequent gains).
But, that level of volatility also belies our intuition about the return behavior of financial assets. In this analysis, I want to look at the return distribution which best describes Bitcoin. Better understanding the return distribution can help us to update our intuitions on the asset’s movements, and the role it could play in a portfolio.
In this post, we are going to:
- Gather price data on BTC
- Estimate the parameters of a return distribution for BTC, for Gaussian, logistic, and alpha-stable distributions
- Use a Bayesian method to estimate the best distribution to describe the returns
- Discuss implications for portfolio management with Bitcoin
We will need to load the following libraries
library(quantmod) library(libstableR) library(tidyverse) library(EnvStats)
Analysis – The Data
As usual, we need to start by getting some data. I am going to use the quantmod library in R to gather BTC price data from yahooFinance:
library(quantmod) # Load the library getSymbols('BTC-USD') # Get daily price data from yahooFinance btc <- `BTC-USD` # change the name of the data btc <- btc[,4] %>% Delt(k=1) %>% na.omit() # pull adjusted close, convert to return, and remove NA data
Next, we need to estimate the parameters of each distribution:
pars_init <- btc %>% coredata() %>% stable_fit_init() pars_as <- btc %>% coredata() %>% stable_fit_koutrouvelis( pars_init = pars_init ) pars_gauss <- c( mean(coredata(btc)), sd(coredata(btc)) ) pars_logistic <- btc %>% coredata() %>% as.numeric() %>% elogis( )
Here we are starting with the alpha-stable distribution parameter estimation. Using the libstableR package, we have to build an initial estimate, then feed that into a final estimate (using the Koutrouvelis method, for which a whole literature exists—I’ll let you GoogleScholar that if you feel like it).
Next, we estimate the familiar mean and standard deviation for the Gaussian distribution. Finally, using the EnvStats package, we estimate the logistic parameters.
Personally, when building density plots in ggplot, I find it much easier to work with random numbers rather than trying to plot the actual formula. So, this next block of code generates 100,000 random numbers based on the parameters of each distribution.
rnd_as <- stable_rnd(100000, pars = pars_as ) rnd_gauss <- rnorm(100000, mean = pars_gauss[1], sd = pars_gauss[2]) rnd_logistic <- rlogis(100000, location = pars_logistic$parameters[1], scale = pars_logistic$parameters[2] )
From here, we can plot the distributions of each in comparison to bitcoin’s actual distribution. First up, Gaussian.
No big surprise here. Bitcoin’s daily price movements do not follow a Gaassian distribution. Small movements happen much more often and the extreme tails happen much more often more often than a Gaussian distribution would predict. Next, let’s look at the logistic distribution.
Logistic distributions have more persistence in the tails, and more clustering around the middle than Gaussian distributions (so, smaller shoulders). However, the logistic distribution still does not capture the skinnyness and tail-persistence of BTC. Let’s look at the alpha-stable distribution next.
Viola! Okay, the alpha-stable distribution has captured the essence of BTC’s returns quite nicely. We get the tall-and-skinny distribution and the tail persistence (extreme movements) for which Bitcoin is known! In fact, if we zoom in on just the left tail as an example, we can see that the actual distribution pretty closely matches the decay pattern of the alpha-stable distribution (though there are some holes along the way, as there often are in empirical data).
All of this, however, is for daily returns. While traders may be interested in such things, longer-term investors are more likely concerned with monthly or yearly returns. Let’s update our analysis using monthly returns. The only difference in the code is the to.monthly() function in the first line.
btc <- `BTC-USD` %>% to.monthly() btc <- btc[,4] %>% Delt(k=1) %>% na.omit() pars_init <- btc %>% coredata() %>% stable_fit_init() pars_as <- btc %>% coredata() %>% stable_fit_koutrouvelis( pars_init = pars_init ) pars_gauss <- c( mean(coredata(btc)), sd(coredata(btc)) ) pars_logistic <- btc %>% coredata() %>% as.numeric() %>% elogis( ) rnd_as <- stable_rnd(100000, pars = pars_as ) rnd_gauss <- rnorm(100000, mean = pars_gauss[1], sd = pars_gauss[2]) rnd_logistic <- rlogis(100000, location = pars_logistic$parameters[1], scale = pars_logistic$parameters[2] )
What we find in the plot is that the actual monthly distribution of bitcoin is… weird. It really does not seem to match any distribution very cleanly. A look in the tails shows a similar result. More interestingly, the distribution in the tails decays more like the Gaussian distribution for monthly returns, a fact quickly confirmed by a look at the parameters of the alpha-stable distribution. A Gaussian distribution is an alpha-stable special case. When the alpha parameter is equal to 2, we have a Gaussian distribution (note how the first parameter in the vector below is 1.976—very close to the Gaussian 2). The tail simply does not persist as much as a logistic or alpha-stable distribution may expect. Or, rather, the tail hasn’t decayed as quickly so far…
> pars_as [1] 1.97635839 1.00000000 0.15907214 0.08137334 >
Analysis – Best Fits
I’m a practitioner, which means that, though there are very technical methods for determining which distribution best fits the data, I am more interested in a back-of-the-envelope method that gets me most of the way there. For that, then, let’s rely on a Bayesian technique from Taleb’s Statistical Consequences of Fat Tails (p. 54). We replace “Gaussian” with “Distribution” in Taleb’s formula:
From here we can build a simple R function to capture this relationship.
bayes.f <- function(prior, prob_event_given_dist, prob_event_given_non_distribution){ (prior * prob_event_given_dist) / ( (1-prior) * prob_event_given_non_distribution + prior * prob_event_given_dist ) }
We need to pick an observation that applies as the probability of our event, a priori.
ecdf.f <- ecdf( coredata(btc) ) # Then the completed function looks like this bayes.f( prior_gauss, pnorm(-0.20, mean = pars_gauss[1], sd= pars_gauss[2]), ecdf.f(-0.20))
And, now we can apply the function to a range of possible priors to find which distribution sticks out as the best. First, we look at daily bitcoin returns.
Very clearly, the alpha-stable distribution stands out as most likely. Indeed, even when you believe that the logistic distribution applies with 99% certainty, you must accept the alpha-stable distribution as the more likely so long as you expect it to apply with 25% certainty before you witnessed the event. And, as we can see, the Gaussian distribution is a terrible fit, no matter how much you believe it applies ahead of time.
Things change, however, when we apply the same approach to monthly returns. For monthly returns, the logistic distribution appears to be the best fit for observing extreme events—even more fitting than the alpha-stable distribution. Unlike daily returns, Gaussian is not a bad model for monthly returns, all told.
What this Means
Interestingly, for both traders and investors this is good news. Bitcoin is extremely volatile day-to-day, and traders who use an alpha-stable distribution to price BTC derivatives, for example, are likely to have an edge over those who use a more off-the-shelf Gaussian model (Black-Scholes assumes Gaussian). Or, for swing and momentum traders, understanding the dynamics and expectations for gains/losses is important for bet sizing.
However, for longer-term investors, BTC tends to average out to an approximately-Gaussian distribution at the monthly level. This means that long-term investors, through rebalancing, may be able to volatility-harvest BTC in their portfolios (that is, assuming rebalancing can occur with relative ease and low costs).
That said, I am suspicious. The evidence for Bitcoin as an alpha-stable distribution at the daily level is overwhelming. One thing we know about these distributions is that they are not well behaved. I would be reluctant to conclude that a Gaussian distribution is best to model monthly returns given the daily return data. It may well be that, so far, the true nature of the distribution has not yet been revealed. In fact, given the parameters for BTC’s daily price movements, we should expect
A 1-Day Loss of… | Frequency… |
10% | 7 Days in Every Year |
20% | 3 Days in Every Year |
30% | 1 Day Every Year |
50% | 1 Day Every Other Year |
Given that the worst one-day return Bitcoin has suffered since 2014 is -37.2%, it seems that the full nature of the distribution has yet to be revealed. That -50% one-day return has yet to be realized, but it may well happen.
Or, this could also mean that increased trading and the incorporation of institutional traders has pushed the return distribution of BTC closer to a Gaussian one. Although, we can test that hypothesis. As it turns out, that isn’t true, as the resultant plot illustrates
# Test alpha per year to see if it has grown closer to 2.0 alpha <- 0 year <- c('2014', '2015', '2016', '2017', '2018', '2019', '2020') for(i in 1:7){ alpha[i] <- stable_fit_koutrouvelis( coredata(btc[ year[i] ]), pars = stable_fit_init( coredata(btc[ year[i] ]) ) )[1] }
In the end, this is the difficulty of portfolio management. First, we have to make an assumption about which distribution to use to model returns. Second, we have to make a forecast about the parameters of that distribution. Both are hard to do. It is even harder with a new asset class like bitcoin where the dynamics are not entirely worked out, and the market itself is still developing.
At any rate, getting the distribution at least close to correct is worth the time to get as right as possible. Clearly, traders and investors operating with an assumption of Gaussian returns (which is most portfolio optimization and derivative pricing software!) are operating at a distinct disadvantage to those with the better model.
This also informs risk control. When you can expect to lose 50% in a day, once every other year, risk control becomes critically important to protecting hard-won profits.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.