Inter-country inequality and the World Development Indicators by @ellis2013nz
Peter's stats stuff - R
[This article was first published on Peter's stats stuff - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I recently read the high quality book Global Inequality by Branko Milanovic. When reading this sort of thing, I often find I can increase my engagement with a topic by playing around with the data myself. In this day and age, this is much more likely to be possible than a couple of decades back! I remember when I first studied development economics, typing into Lotus 123 the data from tables in the back of a World Development Report. These days to get the same data we just fire up R and get the data from the WDI package, which speaks directly to the World Bank’s World Development Indicators API.
Re-creating three charts
Chapter 4 of Global Inequality is entitled “Global Inequality in This Century and the Next”. It explores a range of issues to do with inter-country inequality and global inequality. The difference between the two is that “global” compares the income (or wealth, although there usually isn’t adequate data to do this) of all global citizens on an equal basis whereas “inter-country” uses the average income in each country and helps explore the idea of “citizenship rent” ie the bonus one gets from having been born in a wealthier place. I won’t try to summarise the discussion in this excellent chapter, I just recommend reading the original.
I set out to reproduce the first three figures in the chapter and had an interestingly partial success. That is, a partial success, that was partial for interesting reasons.
Global income inequality among countries
First, here is my version of Milanovic’s Figure 4.1. His original figure looks close to an extended version of what I have in just the left panel. This is based on purchasing-power parity (PPP) GDP per capita – that is,
adjusted not just for exchange rates but for differing price levels in different countries. However, Milanovic used data back to 1960, and for more countries than I have available. Like mine, his data were drawn
from the World Development Indicators. But in 2014, the World Development Indicators were updated with new PPP
references. Because of concerns about inaccuracy the data were limited to 1990 onwards. It seems also at some time in the past few years
the WDI were restricted to data about currently existing countries (eg excluding data from the USSR or Yugoslavia).
For these reasons, the left pane of my chart below differs from the original in Global Inequality. And to get the picture over a longer period, in the right pane I had to use a constant US dollar GDP series instead of a PPP one. I think this not only overstates inequality (because a dollar buys more in poor countries than rich ones), it hides some of the patterns; but it’s better than nothing.
If this were important, I’d definitely want to do something about the slowly-changing-dimension problem; that is, the people who have been excluded from counting because their countries have not been in continuous existence over the whole period in question. What to do about this could be a blog post (or a book) in itself…
However, the basic story is still supported even though absolute levels differ from my version to the one based on previous data. Reflecting income growth in China and India, there is a steep decline in the population-weighted Gini coefficient from 1990 or 2000 in my chart; in Milanovic’s original there was a gradual decline from 1980, accelerating from 2000. My unweighted Gini coefficients show similar patterns to his, with inequality declining from around 2000. Average incomes in Latin America, Eastern Europe and Africa failed to catch up with those in wealthier countries in the 1980-2000 period, but have been converging since 2000.
Difference in the combined (population-weighted) growth rates
The second chart I tried to reproduce was Figure 4.2, which shows differing growth rates between the advanced economies (treated as a single bloc) and the “principal emerging economies (excluding China)”. Milanovic defined this latter group as India, Brazil, Indonesia, South Africa and Vietnam; I’ve chosen a larger group. When the bar is above 0, the emerging economies have grown faster than the advanced economies.
The data for Vietnam available to me only started in the 1980s, and this alone leads to quite a difference in conclusion and interpretation between my chart above and Milanovic’s original. Vietnam was ravaged by war for decades until 1975. It’s a populous country and if its data were included in the chart above it would definitely drag down the “emerging” economies’ pre-1980 combined growth rate, and up their rate since 1980. Without Vietnam in the dataset, I see considerably more growth in the historical “emerging” economies than is present in the figure in the original book. Playing around with different combinations of the “principal” emerging economies didn’t make much difference. I chose the countries I did on the basis of population size and data availability.
As in the original chart, I still have a generally stronger relative performance by the (non-China) emerging economies from 2000 onwards than previously; but the trend over time isn’t as dramatic when Vietnam is excluded, more emerging economies are added in, and the updated data source used.
Level of GDP per capita in 1970 and average subsequent growth
Finally, I had good success in creating a substantively similar chart to Milanovic’s Figure 4.3, shown below:
The descriptive/historical interpretation of my chart is identical to Milanovic’s. For Asian and Western countries, the countries with lower incomes in 1970 have successfully played catch-up, with higher average annual economic growth for poorer countries. For Eastern European, Latin American and African countries, this hasn’t been the case, with no relationship between income in 1970 and subsequent growth.
I have a little more information in my charts than was in the original. Point sizes now show population, which is important for appreciating the significance of India and China in the right hand panel in particular. I also include both weighted and unweighted regression lines; Milanovic discusses the importance of weighting by population in this sort of analysis but I think in the chart he has presented only the unweighted regression line (the substance of the conclusion stays the same).
If you’re wondering who the tiny “Asian” country is that had negative average GDP per capita growth over this 46 year period (!) it is Kiribati (pronounced K EE R ih b ae s – unlike as spoken when it was mentioned once on the Gilmore Girls). Following Milanovic, I included Pacific countries in the “Asian” grouping. Of countries with data available, only Liberia, Democratic Republic of the Congo, Central African Republic, Madagascar, Niger had worse shrinkage in economic production per capita.
Delving
Here’s how I did this. First, here’s the code to load up R functionality for the whole session, set a colour palette I’m going to use a couple of times, explore the names of the various data collections available in the WDI interface, and download the datasets I need for GDP per capita (constant US dollars), GDP per capita (constant PPP US dollars) and population:
I wanted to get a feel for the data in its most basic presentation, so I drew myself a time series line chart to see if China’s economy grew as fast in the data as I expected it to:
Next I had to sort a problem of which “countries” to include. The WDI data includes a number of regional aggregations, and also countries that are missing quite a bit of data.
I had thought that we might find the number of countries with GDP data (not the PPP data, which I knew started in 1990) might increase to 1990 and then be flat, but we see in the chart below that it’s more complicated than that:
Here’s a chart of the countries that have at least some data, but not for every year from 1960 to 2016. As well as things I expected (Russian Federation and Germany don’t enter the dataset until relatively late, due to changing national boundaries), there are a few surprises; for example, New Zealand’s GDP per capita in this particular dataset doesn’t start until 1977:
Here’s the code for sorting through the country and regional issues:
Here’s the code that draws the three main charts. The tidyverse makes it pretty easy to do this calculation of summary statistics like Gini or weighted Gini for arbitrary groupings and turn them into nice charts.
Bonus – forecasting inter-country inequality
An excursus in Milanovic’s book refers to some work by others forecasting trends in global inequality if current trends continue. I’m only working with inter-country data today, nothing that features the two individual-level global inequality, buit I was sufficiently interested in the idea to knock together my own very crude forecasts. I basically projected current country-level trends in GDP per capita and population growth with straightforward time series methods (a hybrid of Hyndman’s auto.arima and ets methods, as conveniently pulled together in the forecastHybrid package by David Shaub with minor help from myself). The result shows the expected continued decline in weighted inequality as China and India continue to “catch up”:
Here’s the code for that forecasting exercise:
To leave a comment for the author, please follow the link and comment on their blog: Peter's stats stuff - R.