[This article was first published on free range statistics - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I set out to see if it was possible to reproduce the UN’s 2022 Revision of World Population Prospects for a given country by cohort component projection from the fertility, mortality and immigration rates and population starting point published as part of their projection. The motivation is to make small changes to some of those parameters – for example by substituting in a recent census result for the population and a given year – and “re-run” the projections to see the impact of changes, or to get a more up-to-date version with data that wasn’t available to the UN at the time of their projection.
It turns out this wasn’t too hard (one morning’s work for the modelling, then a few hours of write-up), particularly in cases where migration is small in the projection period. I was able to reproduce almost exactly population, birth and death totals to 2100 for Vanuatu, and demonstrate the impact of updating the 2020 year for their recent census population totals and fertility rates, and getting a slightly lower projection as a result.
Here’s my reproduction of Vanuatu’s population projection from 2020 to 2100, using just the 2020 population totals and the forecast fertility, mortality and migration rates. As you can see it’s basically identical to the UN totals:
And here’s the same method tweaked for the actual 2020 census total and with a rough adjustment made to fertility rates based on what was observed at the 2020 census:
Of course, this method delivers a full set of projections by age and sex, and we could construct life tables or any indicators we want from it. Here are population pyramids comparing the published UN projections for 2050 with my revised set. Not visually stunning in its comparison, but enough to prove that it’s possible:
Reproducing UN projections
So here’s how I went about that.
First, downloading all the data. The UN recommend bulk downloads of their CSV files. For my purposes I first need the fertility, mortality and population by sex and one year age groups. Of the population original data I am only going to use the 2020 year, and then project it forward myself based on fertility, mortality and migration; but I want the full set for comparison purposes. For migration, I couldn’t see in my hasty look at the UN site a dataset of migration projections by age and sex, so I just use the much simpler net migration rate (per thousand people) per year in the projection period. Here’s code to download all this UN data:
Next, I wrote my own cohort component population projection function. I wanted to do this from scratch rather than using an existing demography package to make sure I understood what was happening (I’m not a demographer) and could make tweaks as necessary to match the UN approach. I used this tutorial by Farid Flici as my starting point; abstracted his code into a function for easy use with multiple countries, and added a net migration component.
Because migrants’ ages tend to be dissimilar from the country they are migrating too - they are more likely to be in the prime of their working / family life I believe - I needed a way to set the ages of migrants. In the function below I defaulted to a normal distribution of ages, mean 28 and standard deviation 11, which worked well to get similar results to the UN in a few countries I tried. This was the hardest and most discretionary part of the exercise.
My observation is that demographers seem to think in terms of matrices of numbers rather than database-oriented tidy data, and I have kept that matrix approach in this function.
Next, I wrote a function to extract the necessary fertility, mortality, migration rates and 2020 starting population from the UN data, feed it into my pop_proj() function and return the result. Unlike pop_proj(), which is to some degree fully portable, this function is very much specific to this particular project and is really just a convenience function for grabbing the data and turning it into the right units and shapes (vectors and matrices) needed for pop_proj().
Note that this function returns, in addition to the results of the population projection, the various inputs in their correct units and shape. This will be useful later when we want to modify some of those inputs.
Now that we’ve got these functions, using them to do projections from 2020 and compare those projections to the published numbers is pretty straight forward. Here’s the code to do that for Vanuatu, which produces the first chart at the top of this blog post:
As we can see the results are pretty much identical. Let’s look at my projected births and deaths based on fertility and mortality rates, and compare them to the published projected numbers
I’m pretty happy with those. There’s definitely some discrepancies and a lag in the births which I suspect come down to how one treats populations of mothers - numbers on 1 January v 1 July, that sort of thing. But in the scheme of things these are very small.
Now, Vanuatu is relatively easy because the UN assumed net zero migration in the projection period. We can see this by comparing the net migration rates in their Indicators dataset for a few countries, with this code:
For countries that have good data on it, migration is a big deal in the projections; but forecasting is hard, particularly of the future.
My first few goes at reproducing the projections for Australia and India tended to be badly out because I had added in net migration evenly across the whole age distribution. My eventual solution, making net migration bell curved with an average age of 28, is a bit of a hack with the parameters chosen to make Australia’s projections come out right. Definitely a better method would be to have actual age-specific net migration forecasts. Whether such things are possible will very much depend on the country; it’s probably possible for Australia, but not for most of the countries I work with.
Here’s the final result comparing UN projections with mine for a few interesting countries:
Adjusting the starting point
Now, the whole point of this exercise was to see if we can plausibly adjust the starting point - say the population totals in 2020, or the forecast fertility rates - and say we are building on the UN’s projections to get our own. Here’s my rough demo of how we might do that, again using the case of Vanuatu. Vanuatu’s 2020 census wasn’t available at the time of the UN’s 2022 population projections, so the actual population and fertility numbers for 2020 differ somewhat (of course) from what was projected.
For the below, I am relying on the analytical report of the Vanuatu census. On a quick glance I didn’t see a table of the actual total population by single year age and sex, so I just adjusted the UN’s projected totals by a factor to make them add up to the correct 2020 total males and females. Of course if we were doing this for real we’d get the real numbers straight from the census; I actually can do this easily enough at work but obviously for this blog wanted to use only easily accessible public data.
And that’s what gets us these results:
I’m hoping this might be actually useful for pragmatic updates of the UN population projections when more current data is available, without having to revise everything from scratch.
OK, that’s all for today.
Related
To leave a comment for the author, please follow the link and comment on their blog: free range statistics - R.