Site icon R-bloggers

What value is cross country GDP correlation? [Part One]

[This article was first published on Back Side Smack » R Stuff, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Cross-country GDP correlation from 1991-2009

The above graph borders on chartjunk (and is nothing like Paul Butler’s amazing Facebook map). We can see some variation in color but mostly it is a set of lines between 152 country capitals with no means to determine which lines are important! However the creation of the graph and the data behind it are interesting. Visualizing correlation between multiple series is often difficult without some additional structure or information. We can plot blocks of scatterplots in a data frame but readability suffers for more than ten or so variables. I also think ccf() refuses to render for multiple time series of more than ten variables. If we have information strictly ordering the variables (and hopefully related to correlation) we can make a heatmap or a 3d plot–with countries we can order by GDP per capita or some other variable but such an ordering becomes unwieldy. But the alternative isn’t pretty, as you see above.

This chart started out as an exploration into correlation of GDP growth between countries. An abundance of evidence supports the claim that correlation between asset classes moves to one in a downturn, but this may not be the case for countries (at least for an unweighted sample). There is also some evidence suggesting that movements in GDP across countries have become more synchonrized, which accords with our basic story of increasing financialization and decreasing transaction costs. Testing these hypothesis rigorously is beyond the scope of this blog, but I want to poke at a few interesting avenues for research.

To measure GDP correlation we downloaded GDP per capita for 207 countries from the World Bank using the WDI package available on CRAN and maintained by Vincent Arel-Bundock at the University of Michigan. With it we can get country-year data from 1991-2009 (a range chosen simply to minimize missing values due to name changes, revolutions, etc.) on GDP and a number of other indicators. If you haven’t tried out the package please do so, it is a wonderful tool.

Once the data are imported into a dataframe we can convert them to ts (or xts) and check for stationarity. Eyeballing GDP series gives us a hint that they are non-stationary but this is far from clear for GDP growth. Though ccf() may sometimes detrend series before showing a result, a non-stationary series can result in us finding correlation (either within a single series or among multiple series) where non may exist had we checked the detrended series. More worrisome is the shaky foundation provided by a stationary series for further modeling.

Using Cheung and Lai (1995) as a rough guide, we cannot reject non-stationarity for most of the GDP series, however we can safely reject non-stationarity for the bulk (though not all) of the differenced GDP series. I suspect these numbers would improved if we converted differenced GDP to growth rate (and not per capita change), but that will wait for part three.

Now we can convert our time series (with 171 countries, removing those with missing years) and roughly 20 years into observations of country pair correlation–roughly 14,000 observations because . Some statistical violence is performed in expanding 3000 country-year observations to 14,000 observations, but I’m confident we haven’t violated the Prime Directive. With only GDP growth as a guide, we have no idea which of these observations are economically important and in part one at least no attempt is made to determine statistical significance. Further, many of these correlations are simply accounting results. Even if there is some positive correlation between Albania and Vietnam’s growth we have no reason to believe such a result arises out of an economic relationship between the two or even a coherent shared causal factor. But what do we find?

The distribution is not centered around zero. Unfortunately due to our ambiguous stationarity tests we cannot conclusively say whether the positive correlation is due to an underlying trend, massive movement together during the recent recession or a third and potentially more interesting explanation. At least it explains why most of the lines in the map are purple!

As we explore this issue over the next few weeks look for some more respectable econometrics and more informative graphs. Code to reproduce the above graphs or make your own is below.

To leave a comment for the author, please follow the link and comment on their blog: Back Side Smack » R Stuff.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.