Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The area of bibliometrics is not my area of expertise but is still of interest as a researcher. I sometimes think about how Google has impacted the way we title articles. Gone are the days of witty, snappy titles. Title selection is an art form but of a different kind. Generally, researchers try to construct titles of the most searchable keywords. In trying to title an article today and came upon an Internet article entitled Heading for Success: Or How Not to Title Your Paper.
According to the article, to increase citation rates, a title should:
- Contain no
?
or!
- May contain a
:
- Should be between 31-40 character
- Avoid humor/pun
In seeing:
…some authors are tempted to spice them up with a touch of humour, which may be a pun, a play on words, or an amusing metaphor. This, however, is a risky strategy.
my mind went to the classic Jacob Cohen (1994) paper entitled The Earth is Round (p < .05). In 1994 the world was different; Google didn't exist yet. I ask, “What if Cohen had to title his classic title in 2014?” What would it look like?
Keywords: Mining “The Earth is Round (p < .05)”
I set to work by grabbing the paper's content and converting to plain text. Then I decided to tease out the most frequent terms after stemming and removing stopwords. Here's the script I used:
library(qdap); library(RCurl); library(wordcloud); library(ggplot2) cohen_url <- "https://raw.githubusercontent.com/trinker/cohen_title/master/data/Cohen1994.txt" cohen <- getURL(cohen_url, ssl.verifypeer = FALSE) ## remove reference section and title cohen <- substring(strsplit(cohen, "REFERENCES")[[c(1, 1)]], 34) ## convert format so we can eliminate strange characters cohen <- iconv(cohen, "", "ASCII", "byte") ## replacement parts bads <- c("-", "<e2><80><9c>", "<e2><80><9d>", "<e2><80><98>", "<e2><80><99>", "<e2><80><9b>", "<ef><bc><87>", "<e2><80><a6>", "<e2><80><93>", "<e2><80><94>", "<c3><a1>", "<c3><a9>", "<c2><bd>", "<ef><ac><81>", "<c2><a7>", "<ef><ac><82>", "<ef><ac><81>", "<c2><a2>", "/j") goods <- c(" ", " ", " ", "'", "'", "'", "'", "...", " ", " ", "a", "e", "half", "fi", " | ", "ff", "ff", " ", "ff") ## sub the bad for the good cohen <- mgsub(bads, goods, clean(cohen)) ## Stem it cohen_stem <- stemmer(cohen) ## Find top words (cohen_top_20 <- freq_terms(cohen_stem, top = 20, stopwords = Top200Words)) plot(cohen_top_20) ## WORD FREQ ## 1 test 21 ## 2 signiffc 19 ## 3 research 18 ## 4 probabl 17 ## 5 size 17 ## 6 data 15 ## 7 h 15 ## 8 effect 14 ## 9 p 14 ## 10 statist 14 ## 11 given 13 ## 12 hypothesi 13 ## 13 analysi 11 ## 14 articl 11 ## 15 nhst 11 ## 16 null 11 ## 17 psycholog 11 ## 18 conffdenc 10 ## 19 correl 10 ## 20 psychologist 10 ## 21 result 10 ## 22 theori 10
library(wordcloud) with(cohen_top_20, wordcloud(WORD, FREQ)) mtext("Content Cloud: The Earth is Round (p < .05)", col="blue")
What Would Cohen Have Titled “The Earth is Round (p < .05)”?
So what would Cohen have titled “The Earth is Round (p < .05)” in 2014? Looking at the results… I don't know. It's fun to speculate. Maybe some could suggest in the comments but as for me I still like “The Earth is Round (p < .05)”.
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997-1003. doi:10.1037/0003-066X.49.12.997
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.