Do older SOers use fewer words?
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
On StackOverflow, to posters with more experience ask their questions in fewer words?
No. There’s no visible difference:
Chars of non-code:
Chars of code:
The data comes from the super-handy StackOverflow API, which was retrieved using wget and then parsed using rjson and XML.
First read in and parse the JSON:
so.R 1 library(rjson) 2 library(XML) 3 library(ggplot2) 4 library(plyr) 5 6 read.qs = function(path) { 7 fromJSON(file = path)$questions 8 } 9 10 questions = do.call(c, 11 lapply(c('page-1.json', 'page-2.json', 'page-3.json'), 12 read.qs 13 ) 14 )
Then for each one parse the HTML and look for </tt> and <tt class="docutils literal"><p></tt> tags:</p>
<pre>
so.R (cont)
15 Table = ldply(questions, function(q) {
16 body.text = sprintf('<body>%s</body>', q$body)
17 body = htmlParse(body.text)
18
19 description = tot.length.of(body, '//p//text()')
20 code = tot.length.of(body, '//pre//text()')
21
22 rep = q$owner$reputation
23
24 data.frame(
25 rep, description, code
26 )
27 })
(where tot.length.of is:
so.R (cont) 28 tot.length.of = function(doc, query) { 29 parts = xpathApply(doc, query, xmlValue) 30 text = paste(parts, collapse='') 31 nchar(text) 32 }
)
Then make the plots:
so.R (cont) 33 png('description.png') 34 print(ggplot(data=Table) 35 + geom_point(aes(rep, description)) 36 + scale_x_log10() 37 + scale_y_log10() 38 + xlab('Rep') 39 + ylab('Verbosity') 40 ) 41 dev.off() 42 43 png('code.png') 44 print(ggplot(data=Table) 45 + geom_point(aes(rep, code)) 46 + scale_x_log10() 47 + scale_y_log10() 48 + xlab('Rep') 49 + ylab('Verbosity') 50 ) 51 dev.off() $ Rscript so.R >/dev/null 2>&1
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.