Update: Finding Economic Articles With Data

[This article was first published on Economics and R - R posts, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

An earlier post from February, describes a Shiny app that allows to search among currently more than 4000 economic articles that have an accessible data and code supplement. Finally, I managed to configure an nginx reverse proxy server and now you can also access the app under a proper https link here:

https://ejd.econ.mathematik.uni-ulm.de

(I was very positively surprised how easy is it was to change http to https using certbot). Some colleagues told me that they could not access the app under the originally posted link:

http://econ.mathematik.uni-ulm.de:3200/ejd/

I am not sure about the exact reason, but perhaps some security settings don’t allow to access web sites on a non-standard port like 3200. Hopefully the new link helps.

Since my initial post, the number of articles has grown, and I included new journals like Econometrica or AER Insights.

The main data for my app can be downloaded as a zipped SQLite database from my server. Let us do some analysis.

library(RSQLite)
library(dbmisc)
library(dplyr)
db = dbConnect(RSQLite::SQLite(),"articles.sqlite") %>%
  set.db.schemas(
    schema.file=system.file("schema/articles.yaml",
    package="EconJournalData")
  )

articles = dbGet(db,"article")
fs = dbGet(db,"files_summary")

Let us look grouped by journal at the share of articles whose code supplement has R files:

fs %>% 
  left_join(select(articles, id, journ), by="id") %>%
  group_by(journ) %>%
  mutate(num_art = n_distinct(id)) %>%
  filter(file_type=="r") %>%
  summarize(
    num_art = first(num_art),
    num_with_r = n(),
    share_with_r=round((num_with_r / first(num_art))*100,2)
  ) %>%
  arrange(desc(share_with_r))
journnum_artnum_with_rshare_with_r
ecta1091715.6
jep113119.73
restud216125.56
aejmic11454.39
aer1453463.17
aejpol378112.91
aejapp385112.86
aejmac28282.84
jeea11521.74
restat73360.82

We see that there is quite some variation in the share of articles with R code going from 15.6% in Econometrica (ecta) to only 0.82% in the Review of Economics and Statistics (restat). (The statistics exclude all articles that don’t have a code supplement or a supplement whose file types I did not analyse, e.g. because it is too large or the ZIP files are nested too deeply.)

Overall, we still have a clear dominance of Stata in economics:

# Number of articles with analyes data & code supplementary
n_art = n_distinct(fs$id)

# Count articles by file types and compute shares
fs %>% group_by(file_type) %>%
  summarize(
    count = n(), 
    share=round((count / n_art)*100,2)
  ) %>%
  # note that all file extensions are stored in lower case
  filter(file_type %in% c("do","r","py","jl","m")) %>%
  arrange(desc(share))
file_typecountshare
do283470.18
m97924.24
r1293.19
py421.04
jl20.05

Roughly 70% of the articles have Stata do files and almost a quarter Matlab m files and only slightly above 3% R files.

I also meanwhile have added a log file to the app that anonymously stores data about which articles that have been clicked on. The code below shows the 20 most clicked on articles so far:

dat = read.csv("article_click.csv")

dat %>%
  group_by(article) %>%
  summarize(count=n()) %>%
  na.omit %>%
  arrange(desc(count)) %>%
  print(n=20)

## # A tibble: 699 x 2
##    article                                                           count
##    <fct>                                                             <int>
##  1 A Macroeconomic Model of Price Swings in the Housing Market          27
##  2 Job Polarization and Jobless Recoveries                              20
##  3 Tax Evasion and Inequality                                           19
##  4 Public Debt and Low Interest Rates                                   16
##  5 An Empirical Model of Tax Convexity and Self-Employment              13
##  6 Alcohol and Self-Control: A Field Experiment in India                11
##  7 Drug Innovations and Welfare Measures Computed from Market Deman~    11
##  8 Food Deserts and the Causes of Nutritional Inequality                11
##  9 Some Causal Effects of an Industrial Policy                          11
## 10 Costs  Demand  and Producer Price Changes                            10
## 11 Breaking Bad: Mechanisms of Social Influence and the Path to Cri~     9
## 12 Government Involvement in the Corporate Governance of Banks           8
## 13 Performance in Mixed-sex and Single-sex Tournaments: What We Can~     8
## 14 Disease and Gender Gaps in Human Capital Investment: Evidence fr~     7
## 15 Housing Constraints and Spatial Misallocation                         7
## 16 Inherited Control and Firm Performance                                7
## 17 Labor Supply and the Value of Non-work Time: Experimental Estima~     7
## 18 Pricing in the Market for Anticancer Drugs                            7
## 19 The Arrival of Fast Internet and Employment in Africa                 7
## 20 The Economic Benefits of Pharmaceutical Innovations: The Case of~     7
## # ... with 679 more rows


For a nice thumbnail in R-bloggers let us finish with a screenshot of the app:


To leave a comment for the author, please follow the link and comment on their blog: Economics and R - R posts.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)