Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In a series of previous posts, I’ve spent some time looking at the idea that the review and publication process in political science—and specifically, the requirement that a result must be statistically significant in order to be scientifically notable or publishable—produces a very misleading scientific literature. In short, published studies of some relationship will tend to be substantially exaggerated in magnitude. If we take the view that the “null hypothesis” of no relationship should not be a point at
This opens an important question: is this just a problem in theory, or is it actually influencing the course of political science research in detectable ways?
To answer this question, I am working with Ahra Wu (one of our very talented graduate students studying International Relations and political methodology at Rice) to develop a way to measure the average level of bias in a published literature and then apply this method to recently published results in the prominent general interest journals in political science.
We presented our initial results on this front at the 2013 Methods Meetings in Charlottesville, and I’m sad to report that they are not good. Our poster summarizing the results is here. This is an ongoing project, so some of our findings may change or be refined as we continue our work; however, I do think this is a good time to summarize where we are now and seek suggestions.
First, how do you measure the bias? Well, the idea is to be able to get an estimate for
The relationship that you get between
As you can see, as
With a plot like this in hand, determining
Concordantly, we examined 177 quantitative articles published in the APSR (80 articles in volumes 102-107, from 2008-2013) and the AJPS (97 articles in volumes 54-57, from 2010-2013). Only articles with continuous and unbounded dependent variables are included in our data set. Each observation of the collected data set represents one article and contains the article’s main finding (viz., an estimated marginal effect); details of how we identified an article’s “main finding” are in the poster, but in short it was the one we thought that the author intended to be the centerpiece of his/her results.
Using this data set, we used the technique described above to estimate the average % absolute bias,
As shown in the Table below, our preliminary bias estimates for all of these prior densities hover in the 40-50% range, meaning that on average we estimate that the published estimates are
prior density | avg. % absolute bias |
normal | 41.77% |
uniform | 40% |
spike-and-slab | 55.44% |
*note: results are preliminary. |
I think it is likely that these estimates will change before our final analysis is published; in particular, we did not adjust the range of the independent variable or the variance of the error term
We have much more to do in this research, including examining different evidence of the existence and prevalence of publication bias in political science and investigating possible solutions or corrective measures. We will have quite a bit to say in the latter regard; at the moment, using Bayesian shrinkage priors seems very promising while requiring a result to be large (“substantively significant”) as well as statistically significant seems not-at-all promising. I hope to post about these results in the future.
As a parting word on the former front, I can share one other bit of evidence for publication bias that casts a different light on some already published results. Gerber and Malhotra have published a study arguing that an excess of p-values near the 0.05 and 0.10 cutoffs, two-tailed, is evidence that researchers are making opportunistic choices for model specification and measurement that enable them to clear the statistical significance bar for publication. But the same pattern appears in a scenario when totally honest researchers are studying a world with many null results and in which statistical significance is required for publication.
Specifically, we simulated 10,000 studies (each of sample size n=100) where the true DGP for each study j is
This is a very close match for a diagram of t-ratios published in the Gerber-Malhotra paper, which shows the distribution of z-statistics (a.k.a. large-sample t-scores) from their examination of published articles in AJPS and APSR.
So perhaps the fault, dear reader, is not in ourselves but in our stars—the stars that we use in published tables to identify statistically significant results as being scientifically important.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.