30 issues of Demographic Digest – the most frequent journals

Ilya Kashnitsky

5 years ago

[This article was first published on Ilya Kashnitsky, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Today, the 30-th issue of my Demographic Digest was published.

Demographic Digest is my project that started in November 2015. Twice a month I select fresh demographic papers and write brief summaries of them in Russian to be published in Demoscope Weekly, the most popular Russian journal/website in social sciences. If you read Russian, you may want to browse the archive or visit the website of the project (which is still to be filled).

The project is in the transitional phase now. Since 2016 Demographic Digest welcomes contributions from from external authors. In February 2017 I launched the first iteration of a project for the students of National Research University Higher School of Economics.

To draw a line after the first phase of the project, I analysed what journals supplied Demographic Digest most frequently. Also, my desire was to try visualizing data with treemaps, which I mentioned in the bonus part¹ of the latest digest issue.

For that, I exported the bibliographic data of all the papers covered in Demographic Digest. I use Zotero as a reference manager; the paper records are exported as a single .bib file, which I then saved as a plain text (.txt) file. Then I read this data in R, cleaned it, and finally visualized.

# load required packages
library(tidyverse)
library(stringr)
library(readxl)
library(extra)
my <- "Roboto Condensed"

df <- data.frame(lines = readLines("https://ikashnitsky.github.io/doc/misc/dd-stats/dd-bib.txt")) %>% 
        mutate(lines = lines %>% as.character()) %>% 
        
        # grab only the lines that contain journals' titles
        filter(lines %>% str_detect("journaltitle")) %>% 
        
        # remove everything that is not the bare journal's title
        transmute(journals = lines %>% 
                       str_replace_all(pattern = "\tjournaltitle = |\\Q{\\E|\\Q}\\E,|\\Q}\\E", 
                                       replacement = "")) %>% 
        # calculate frequencies
        group_by(journals) %>% 
        summarise(n = n())

For one journal title, Ageing and Society, I failed to replace the “\&” using regular expressions. This one is to be fixed manually. I also corrected the title of Lancet journal removing the article “The”. Finally, I corrected the frequencies for Population Studies and Population and Development Review subtracting 6, because for both journals I provided lists of most cited papers as a bonus. Following the same logic, I cleaned the data from the papers that appeared in the bonus part.

# correct "Ageing and Society"
df[1,1] <- "Ageing and Society"

# correct the title of Lancet
df <- df %>% mutate(journals = journals %>% str_replace("The Lancet", "Lancet"))

# correct "Population and Development Review" and "Population Studies" for 6 each
# Reason - top cited papers bonus
df[df$journals %in% c("Population and Development Review", "Population Studies"), 2] <- 
        df[df$journals %in% c("Population and Development Review", "Population Studies"), 2] - 6

To provide some additional metrics of the journals, I downloaded bibliometric data from the SCImago Journal & Country Rank projecthttp://www.scimagojr.com/aboutus.php. Demographic journals usually have rather low SJR, compared to medical journals; that’s why I downloaded the data only for journals in Social Sciences (the.xlsx file). Then I read the data in R and join to my data frame.

# read SJR data for journals in Social Sciences
sjr <- readxl::read_excel("https://ikashnitsky.github.io/doc/misc/dd-stats/scimagojr.xlsx", 1) %>% 
        mutate(id = Title %>% tolower())

# join the data frames; note that I create an "id" variable in lower case
df_sjr <- left_join(df %>% mutate(id = journals %>% tolower), sjr, "id")

Finally, it’s time to visualize the data. I use the amazing treemap package².

# Treemap visualization
library(treemap)

treemap(dtf = df_sjr, 
        index = "journals", 
        vSize = "n", 
        vColor = "SJR", 
        type = "value",
        n = 5,
        palette = "BrBG", 
        border.col = "grey10", 
        title = "Journals' frequency in Demographic Digest",
        title.legend = "SJR (only social sciences)",
        family.title = my,
        family.labels = my,
        family.legend = my,
        drop.unused.levels = T)

Here is how the output looks.

Note that the lion’s share of Population Studies is mainly explained by the first issue of Demographic Digest, in which I covered all the papers from the brilliant special issue Population — The long view.

I finish each issue of Demographic Digest with a bonus, in with I cover fun papers, discuss some academia related issues, or just provide link to cool visualizations and projects. ↩
I also tried portfolio and treemapify, but liked the output from treemap most. ↩

To leave a comment for the author, please follow the link and comment on their blog: Ilya Kashnitsky.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.