[This article was first published on Struggling Through Problems, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
On StackOverflow, to posters with more experience ask their questions in fewer words?
No. There’s no visible difference:
Chars of non-code:
Chars of code:
The data comes from the super-handy StackOverflow API, which was retrieved using wget and then parsed using rjson and XML.
First read in and parse the JSON:
so.R 1 library(rjson) 2 library(XML) 3 library(ggplot2) 4 library(plyr) 5 6 read.qs = function(path) { 7 fromJSON(file = path)$questions 8 } 9 10 questions = do.call(c, 11 lapply(c('page-1.json', 'page-2.json', 'page-3.json'), 12 read.qs 13 ) 14 )
Then for each one parse the HTML and look for <pre> and <p> tags:
so.R (cont) 15 Table = ldply(questions, function(q) { 16 body.text = sprintf('<body>%s</body>', q$body) 17 body = htmlParse(body.text) 18 19 description = tot.length.of(body, '//p//text()') 20 code = tot.length.of(body, '//pre//text()') 21 22 rep = q$owner$reputation 23 24 data.frame( 25 rep, description, code 26 ) 27 })
(where tot.length.of is:
so.R (cont) 28 tot.length.of = function(doc, query) { 29 parts = xpathApply(doc, query, xmlValue) 30 text = paste(parts, collapse='') 31 nchar(text) 32 }
)
Then make the plots:
so.R (cont) 33 png('description.png') 34 print(ggplot(data=Table) 35 + geom_point(aes(rep, description)) 36 + scale_x_log10() 37 + scale_y_log10() 38 + xlab('Rep') 39 + ylab('Verbosity') 40 ) 41 dev.off() 42 43 png('code.png') 44 print(ggplot(data=Table) 45 + geom_point(aes(rep, code)) 46 + scale_x_log10() 47 + scale_y_log10() 48 + xlab('Rep') 49 + ylab('Verbosity') 50 ) 51 dev.off() $ Rscript so.R >/dev/null 2>&1
To leave a comment for the author, please follow the link and comment on their blog: Struggling Through Problems.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.