Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The above graph borders on chartjunk (and is nothing like Paul Butler’s amazing Facebook map). We can see some variation in color but mostly it is a set of lines between 152 country capitals with no means to determine which lines are important! However the creation of the graph and the data behind it are interesting. Visualizing correlation between multiple series is often difficult without some additional structure or information. We can plot blocks of scatterplots in a data frame but readability suffers for more than ten or so variables. I also think ccf()
refuses to render for multiple time series of more than ten variables. If we have information strictly ordering the variables (and hopefully related to correlation) we can make a heatmap or a 3d plot–with countries we can order by GDP per capita or some other variable but such an ordering becomes unwieldy. But the alternative isn’t pretty, as you see above.
This chart started out as an exploration into correlation of GDP growth between countries. An abundance of evidence supports the claim that correlation between asset classes moves to one in a downturn, but this may not be the case for countries (at least for an unweighted sample). There is also some evidence suggesting that movements in GDP across countries have become more synchonrized, which accords with our basic story of increasing financialization and decreasing transaction costs. Testing these hypothesis rigorously is beyond the scope of this blog, but I want to poke at a few interesting avenues for research.
- Part One: Can we reliably measure correlation in GDP growth? Without a rigorous panel framework or a functional form for the process generating per capita GDP can we naively compute (and rely on naive computation) of correlation? How stationary are GDP growth series?
- Part Two: How informative are the classification schemes for countries. First in a back of the envelope test and later in a Bayesian framework, we want to test the informational content of region, income or cultural classifications available to us. In relationship to economic outcomes high level classification schemes are often arbitrary–the easiest example being the need to split Northern Africa and Sub-Saharan Africa for economic analysis. But sorting countries by income may be just as uninformative when it comes to predictive when countries move together. Membership in the OECD represents (roughly) a statement about a level of GDP, not a rate.
- Part Three: What is the appropriate timescale for measuring correlation between countries? If we were to optimally group countries into maximum (or minimum) covariance bundles, how much does the era for backtesting matter? I leave this to the end because working with the full range of dates involves sensibly dealing with missing values in the GDP series and may take some time.
To measure GDP correlation we downloaded GDP per capita for 207 countries from the World Bank using the WDI package available on CRAN and maintained by Vincent Arel-Bundock at the University of Michigan. With it we can get country-year data from 1991-2009 (a range chosen simply to minimize missing values due to name changes, revolutions, etc.) on GDP and a number of other indicators. If you haven’t tried out the package please do so, it is a wonderful tool.
Once the data are imported into a dataframe we can convert them to ts
(or xts
) and check for stationarity. Eyeballing GDP series gives us a hint that they are non-stationary but this is far from clear for GDP growth. Though ccf()
may sometimes detrend series before showing a result, a non-stationary series can result in us finding correlation (either within a single series or among multiple series) where non may exist had we checked the detrended series. More worrisome is the shaky foundation provided by a stationary series for further modeling.
Using Cheung and Lai (1995) as a rough guide, we cannot reject non-stationarity for most of the GDP series, however we can safely reject non-stationarity for the bulk (though not all) of the differenced GDP series. I suspect these numbers would improved if we converted differenced GDP to growth rate (and not per capita change), but that will wait for part three.
Now we can convert our time series (with 171 countries, removing those with missing years) and roughly 20 years into observations of country pair correlation–roughly 14,000 observations because
The distribution is not centered around zero. Unfortunately due to our ambiguous stationarity tests we cannot conclusively say whether the positive correlation is due to an underlying trend, massive movement together during the recent recession or a third and potentially more interesting explanation. At least it explains why most of the lines in the map are purple!
As we explore this issue over the next few weeks look for some more respectable econometrics and more informative graphs. Code to reproduce the above graphs or make your own is below.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.