Finding Economic Articles with Data (2nd Update)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Almost a year is now gone since I posted my last update about my shiny-powered search app. It allows to search among currently more than 5000 economic articles that have an accessible data and code supplement:
https://ejd.econ.mathematik.uni-ulm.de
The main data for my app can be downloaded as a zipped SQLite database from my server. Let us do some analysis.
library(RSQLite) library(dbmisc) library(dplyr) db = dbConnect(RSQLite::SQLite(),"articles.sqlite") %>% set.db.schemas(schema.file=system.file("schema/articles.yaml", package="EconJournalData")) articles = dbGet(db,"article") fs = dbGet(db,"files_summary")
Let us look grouped by journal at the share of articles whose code supplement has R files:
fs %>% left_join(select(articles, id, journ), by="id") %>% group_by(journ) %>% mutate(num_art = n_distinct(id)) %>% filter(file_type=="r") %>% summarize( num_art = first(num_art), num_with_r = n(), share_with_r=round((num_with_r / first(num_art))*100,2) ) %>% arrange(desc(share_with_r))
journ | num_art | num_with_r | share_with_r |
---|---|---|---|
ecta | 144 | 19 | 13.19 |
aeri | 28 | 3 | 10.71 |
jep | 127 | 12 | 9.45 |
restud | 312 | 22 | 7.05 |
jpe | 155 | 9 | 5.81 |
aejmic | 129 | 5 | 3.88 |
aejpol | 426 | 15 | 3.52 |
aer | 1540 | 53 | 3.44 |
jeea | 154 | 5 | 3.25 |
aejapp | 430 | 13 | 3.02 |
aejmac | 314 | 8 | 2.55 |
restat | 813 | 6 | 0.74 |
We see that there is quite some variation in the share of articles with R code going from 13.2% in Econometrica (ecta) to only 0.74% in the Review of Economics and Statistics (restat). (The statistics exclude all articles that don’t have a code supplement or a supplement whose file types I did not analyse, e.g. because it is too large or the ZIP files are nested too deeply.)
Overall, we still have a clear dominance of Stata in economics:
# Number of articles with analyes data & code supplementary n_art = n_distinct(fs$id) # Count articles by file types and compute shares fs %>% group_by(file_type) %>% summarize(count = n(), share=round((count / n_art)*100,2)) %>% # note that all file extensions are stored in lower case filter(file_type %in% c("do","r","py","jl","m")) %>% arrange(desc(share))
file_type | count | share |
---|---|---|
do | 3338 | 70.44 |
m | 1195 | 25.22 |
r | 170 | 3.59 |
py | 68 | 1.43 |
jl | 8 | 0.17 |
Roughly 70% of the articles have Stata do
files and a quarter Matlab m
files and only 3.6% R files.
While R, Python and Julia increased their share over recent years, it seems not like a very strong trend yet.
sum_dat = fs %>% left_join(select(articles, year, id), by="id") %>% group_by(year) %>% mutate(n_art_year = n()) %>% group_by(year, file_type) %>% summarize( count = n(), share=round((count / first(n_art_year))*100,2) ) %>% filter(file_type %in% c("do","r","py","jl","m")) %>% arrange(year,desc(share)) library(ggplot2) ggplot(sum_dat, aes(x=year, y=share, color=file_type)) + geom_line(size=1.5) + scale_y_log10() + theme_bw()
I also have a log file that anonymously stores data about which articles that have been clicked on. The code below shows the 20 most clicked on articles so far:
dat = read.csv("article_click.csv") dat %>% group_by(article) %>% summarize(count=n()) %>% na.omit %>% arrange(desc(count)) %>% print(n=20) ## # A tibble: 2,707 x 2 ## article count ## <fct> <int> ## 1 Consumer Spending during Unemployment: Positive and Normative Implicat~ 50 ## 2 Do Expert Reviews Affect the Demand for Wine? 44 ## 3 Tax Evasion and Inequality 38 ## 4 A Macroeconomic Model of Price Swings in the Housing Market 35 ## 5 Is Your Lawyer a Lemon? Incentives and Selection in the Public Provisi~ 33 ## 6 The Welfare Effects of Social Media 31 ## 7 The Rise of Market Power and the Macroeconomic Implications 29 ## 8 Carbon Taxes and CO2 Emissions: Sweden as a Case Study 27 ## 9 Public Debt and Low Interest Rates 27 ## 10 The Sad Truth about Happiness Scales 25 ## 11 Job Polarization and Jobless Recoveries 24 ## 12 The New Tools of Monetary Policy 24 ## 13 Alcohol and Self-Control: A Field Experiment in India 23 ## 14 Disease and Gender Gaps in Human Capital Investment: Evidence from Nig~ 23 ## 15 Some Causal Effects of an Industrial Policy 23 ## 16 Food Deserts and the Causes of Nutritional Inequality 22 ## 17 Minimum Wage and Real Wage Inequality: Evidence from Pass-Through to R~ 22 ## 18 The Cost of Reducing Greenhouse Gas Emissions 22 ## 19 Adaptation to Climate Change: Evidence from US Agriculture 21 ## 20 Do Parents Value School Effectiveness? 21 ## # ... with 2,687 more rows
So far there were over 11000 thousand clicks in total. Well, that is almost twice as much as the average number of Google searches in 100 milliseconds 😉
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.