Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
During the Covid-19 global health crisis, the organisation where I work – the Pacific Community, or SPC – compiled and published weekly updates on Covid-19 incidence, mortality and vaccination rates. This only recently stopped with the WHO determining in May that the global health emergency was ended, after more than three years.
We published weekly narrative updates as well as a “data flow” on the Pacific Data Hub “dot Stat” implementation at https://stats.pacificdata.org/.
For some reason I can’t now recall, a few months back (it’s just taken me that long to write it up) I was interested in the roll-out of vaccination rates. Here’s how they look over time:
Some countries had a slow start, but they mostly ended up with quite high rates of getting at least two Covid vaccination shots. The global proportion of people that are fully vaccinated is 67%, according to New York Times with data from Our World in Data. Only two Pacific Island countries and territories seem to be lower than that.
Here’s the code to download that from the Pacific Data Hub and draw the chart:
Post continues after R code
The striking thing here is that the slowest roll-out was in Papua New Guinea and Solomon Islands, the two poorest of the larger Pacific Island countries. This made me wonder, is there a general relationship between economic production (say GDP per capita) and vaccination rate in the Pacific? Are the poorer countries the least vaccinated?
That led to this chart:
Papua New Guinea and Solomons do indeed stand out in the bottom left of the chart, but not all countries with low GDP per capita have low vaccination rates. Notably Kiribati, the poorest Pacific Island country and territory of them all (in GDP per capita terms – which might be a bit misleading as they don’t include some fishing license revenue, but it is certainly true that Kiribati is poor by Pacific standards) has a high vaccination rate.
In fact, having done this, I think I would say that vaccination rates in the Pacific are low in countries that are poor AND have population dispersed not only over a wide area (few populations are as dispersed over as many square kilometres as Kiribati’s) but in very numerous inland locations. But I’m not sure that really gets us any further than just saying “Papua New Guinea and Solomon Islands have the lowest vaccination rates”.
Post continues after R code
How do I conclude that the relationship isn’t “statistically significant”? Well, in the first case, we can see that the shaded area for the curved line from the model that’s been fit in the plot could have a horizontal line that fits within the shaded area – imagine a horizontal line drawn at the 25% level. That model is a generalized linear model with a quasibinomial family response. The curve comes not from a smoothing term but just the logit link function that is default for a quasibinomial family and works with the response constrained to be within 0 and 1.
I wanted to check so I also explicitly fit the model myself (rather than relying on ggplot doing it on the fly), which is done in the code below. The conventional t statistic for the coefficient in front of log(GDP)
is 0.07, which is narrowly “not significant” and matches the visual. However the F statistic from an Analysis of Deviance table is significant, 0.02. Is it ok to use an F statistic to test a generalized linear model with a quasibiomial family, with a response that is proportions of a population? To be honest I’m not sure. Most sources seem to say it is, but perhaps I am missing something about the context here.
I’m using the wrong “population” figure for weights in this model – I’ve just used the whole population for each country and territory, rather than the population of people eligible for the vaccine for reporting purpose (which varies by country). But I am almost certain that’s not material, either for the size of dots on the chart or for the model. Anyway, I can’t be bothered to go back and fix that.
I was worried about the small sample size and possible influence of a few points. Knock out Kiribati and Pitcairn and perhaps the model would look significant. Knock out PNG and it certainly wouldn’t be. So I decided to use brute force i.e. the bootstrap to generate a confidence interval for the slope of log(GDP)
. This will give us a better sense of the robustness of evidence here. Some of the bootstrap examples will in fact knock out PNG, Kiribati and Pitcairn; or include them multiple times. Let’s see on average what that does to the estimates.
So the confidence intervals for that slope, from all four methods of calculating bootstrap confidence intervals (“Normal”, “Basic”, “Percentile” and “BCa”) included zero. Case closed, we can’t say there’s statistically significant evidence of a relationship between GDP per capita and Covid vaccination rates, based just on these Pacific countries and territories. Something else more complicated might be happening instead. Or in other words, it might just be a PNG thing.
All this modelling is based on a thought experiment that the 22 countries and territories in the data are a random sample from a hypothetical meta-population. That’s the only way “statistically significant” can mean anything in this sort of situation, when you have observations on the entire real population.
Here’s the code for the modelling part:
Post continues after R code
That’s all for today. Take care out there.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.