Mapping the stock market using self-organizing maps

Data-based investor

4 years ago

[This article was first published on Data based investing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Self-organizing maps are an unsupervised learning approach for visualizing multi-dimensional data in a two-dimensional plane. They are great for clustering and finding out correlations in the data. In this post we apply self-organizing maps on historical US stock market data to find out interesting correlations and clusters. We’ll use data from Shiller, Goyal and BLS to calculate the historical valuations levels, interest rates, inflation rates, unemployment rates and future ten-year total real returns from years 1948 to 2008.

Click to enlarge images

You can see a clear correlation between the different valuation measures, and that low valuations have led to high returns. There’s a slight negative correlation between the valuation measures and unemployment, i.e. valuations have been higher when unemployment has been lower. Charlie Bilello has a great article on the subject. There’s also a positive correlation between unemployment and rates, which means that rates have typically been higher when unemployment has been higher.

Next, let’s look at clusters formed using hierarchical clustering. We’ll form four clusters on the same plane as used in the above analysis. Let’s look at the results:

The balls inside each hexagon correspond to each month. We are currently in the green cluster, which has typically lead to low returns. Why has low unemployment, low rates and low inflation led to low returns, aren’t these things good for the stock market? I see two possible causes: these conditions tend to revert back to their mean (which means worsening macroeconomical conditions), and investors tend to extrapolate past returns into the future (a great tweet on the subject by Michael Batnick). The second part causes high valuations, which is present in the green cluster.

Which cluster is the best place to be in? I’d say the gray one, but the data seems to support the blue one as well. The good thing is that there are other countries that are in both of these clusters. Even though I recommend looking at valuations alone rather than macroeconomic indicators, a good place worth checking for all that macro stuff is tradingeconomics.com.

The R code used in the analysis is available here.

To leave a comment for the author, please follow the link and comment on their blog: Data based investing.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.