R-Bloggers’ Web-Presence

[This article was first published on theBioBucket*, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We love them, we hate them: RANKINGS!

Rankings are an inevitable tool to keep the human rat race going. In this regard I’ll pick up my last two posts (HERE & HERE) and have some fun with it by using it to analyse R-Bloggers’ web presence. I will use number of hits in Google Search as an indicator.

I searched for URLs like this: https://www.google.com/search?q=”http://www.twotorials.com” – meaning that only the exact blog-URL is searched.



Blogs NoHits
http://google-opensource.blogspot.com 82300
http://www.programmingr.com 73500
http://googleresearch.blogspot.com 58000
http://dirk.eddelbuettel.com 53000
http://borasky-research.net 33100
http://casoilresource.lawr.ucdavis.edu 32500
http://andrewgelman.com 30000
http://yihui.name 29600
http://xianblog.wordpress.com 27900
http://nsaunders.wordpress.com 27600
http://chem-bla-ics.blogspot.com 26600
http://plindenbaum.blogspot.com 24600
http://blog.ouseful.info 24300
http://www.vcasmo.com 24200
http://yz.mit.edu 23500
http://romainfrancois.blog.free.fr 22700
http://blog.revolutionanalytics.com 21000
http://robjhyndman.com 18400
http://freakonometrics.blog.free.fr 16100
http://perfdynamics.blogspot.com 15400
http://www.stubbornmule.net 14800
http://zoonek.free.fr 14800
http://jackman.stanford.edu 13900
http://www.bytemining.com 13700
http://learnr.wordpress.com 12600
http://tommy.chheng.com 12200
http://mazamascience.com 12000
http://www.investuotojas.eu 11500
http://www.r-statistics.com 11300
http://www.franklincenterhq.org 10800
http://gettinggeneticsdone.blogspot.com 10700
http://mpastell.com 9930
http://pineda-krch.com 9780
http://blog.saush.com 9220
http://www.premiersoccerstats.com 8950
http://developmentality.wordpress.com 7250
http://www.dataspora.com 7200
http://blog.hiremebecauseimsmart.com 7050
http://isomorphismes.tumblr.com 7040
http://www.mathfinance.cn 6930
http://blog.nguyenvq.com 6150
http://www.drewconway.com 5970
http://www.carlboettiger.info 5520
http://www.statisticsblog.com 5110
http://www.decisionsciencenews.com 4950
http://www.r-chart.com 4810
http://chartsgraphs.wordpress.com 4480
http://www.portfolioprobe.com 4410
http://procomun.wordpress.com 4330
http://jeromyanglim.blogspot.com 4080
http://spatialanalysis.co.uk 4080
http://www.theresearchkitchen.com 4080
http://www.forex-bloggers.com 4070
https://www.rmetrics.org 4050
http://princeofslides.blogspot.com 3900
http://www.cybaea.net 3740
http://www.cerebralmastication.com 3710
http://ygc.name 3670
http://ryouready.wordpress.com 3450
http://jeffreybreen.wordpress.com 3410
http://systematicinvestor.wordpress.com 3400
http://sgsong.blogspot.com 3310
http://industrialengineertools.blogspot.com 3290
http://www.r-tutor.com 3270
http://fishlab.ucdavis.edu 3270
http://ggorjan.blogspot.com 3250
http://blog.ynada.com 3220
http://farmacokratia.blogspot.com 3170
http://4dpiecharts.com 3130
http://heuristically.wordpress.com 3040
http://blog.rtwilson.com 2910
http://www.wekaleamstudios.co.uk 2890
http://www.dataists.com 2840
http://ikanb.wordpress.com 2750
http://shape-of-code.coding-guidelines.com 2730
http://onertipaday.blogspot.com 2710
http://blog.fosstrading.com 2700
http://blog.echen.me 2690
http://www.theusrus.de 2670
http://cloudnumbers.com 2630
http://paulbutler.org 2620
http://biostatmatt.com 2460
http://www.johnmyleswhite.com 2430
http://dataninja.wordpress.com 2360
http://realizationsinbiostatistics.blogspot.com 2340
http://statisfaction.wordpress.com 2300
http://uxblog.idvsolutions.com 2250
http://timelyportfolio.blogspot.com 2210
http://radfordneal.wordpress.com 2200
http://sas-and-r.blogspot.com 2200
http://pairach.com 2110
http://yusung.blogspot.com 2050
http://blog.flacso.edu.mx 2010
http://www.rensenieuwenhuis.nl 2000
http://michaeldhealy.com 1990
http://freigeist.devmag.net 1950
http://www.fernandohrosa.com.br 1920
http://statbandit.wordpress.com 1870
http://www.win-vector.com 1840
http://lukemiller.org 1830
http://ropensci.org 1720
http://www.eggwall.com 1650
http://benmazzotta.wordpress.com 1620
http://bms.zeugner.eu 1610
http://cartesianfaith.wordpress.com 1580
http://linkedscience.org 1570
http://stevemosher.wordpress.com 1550
http://intelligenttradingtech.blogspot.com 1520
http://www.imachordata.com 1480
http://blog.diegovalle.net 1470
http://jermdemo.blogspot.com 1430
http://nortalktoowise.com 1420
http://ekonometrics.blogspot.com 1340
http://digitheadslabnotebook.blogspot.com 1320
http://flyordie.sin.khk.be 1310
http://schamberlain.github.com 1230
http://gribblelab.org 1180
http://www.quantf.com 1130
http://offensivepolitics.net 1020
http://www.markmfredrickson.com 981
http://blog.mckuhn.de 948
http://erehweb.wordpress.com 889
http://confounding.net 886
http://simplystatistics.tumblr.com 875
http://www.babelgraph.org 859
http://csgillespie.wordpress.com 857
http://joewheatley.net 844
http://helmingstay.blogspot.com 843
http://theaverageinvestor.wordpress.com 825
http://quantitative-ecology.blogspot.com 785
http://zvfak.blogspot.com 776
http://ucfagls.wordpress.com 766
http://opendatagroup.com 760
http://cameron.bracken.bz 740
http://rtutorialseries.blogspot.com 738
http://opencpu.org 708
http://novicemetrics.blogspot.com 700
http://lamages.blogspot.com 680
http://nir-quimiometria.blogspot.com 679
http://tonybreyal.wordpress.com 677
http://brokeringclosure.wordpress.com 658
http://socialdatablog.com 643
http://dancingeconomist.blogspot.com 629
http://www.rtexttools.com 603
http://danganothererror.wordpress.com 589
http://thebiobucket.blogspot.com 567
http://holtmeier.de 531
http://val-systems.blogspot.com 519
http://thelogcabin.wordpress.com 489
http://dcemri.blogspot.com 484
http://rdatamining.wordpress.com 477
http://bridgewater.wordpress.com 460
http://www.rcasts.com 444
http://dsparks.wordpress.com 436
http://pr.cloudst.at 422
http://polstat.org 409
http://www.compmath.com 401
http://techno-realism.blogspot.com 399
http://www.backsidesmack.com 395
http://geotheory.org 393
http://miraisolutions.wordpress.com 367
http://econometricsense.blogspot.com 352
http://blog.binfalse.de 344
http://rforcancer.drupalgardens.com 317
http://blog.rstudio.org 316
http://mcfromnz.wordpress.com 309
http://www.quantumforest.com 309
http://blog.quanttrader.org 303
http://chrisladroue.com 293
http://www.michaelbommarito.com 289
http://procrun.com 280
http://mikeksmith.posterous.com 279
http://bio7.org 278
http://kbroman.wordpress.com 278
http://martynplummer.wordpress.com 272
http://bryer.org 268
http://www.funjackals.com 265
http://www.harlan.harris.name 252
http://www.milktrader.net 248
http://www.surefoss.org 241
http://rigorousanalytics.blogspot.com 231
http://www.jameskeirstead.ca 229
http://programming-r-pro-bro.blogspot.com 225
http://plausibel.blogspot.com 224
http://statistic-on-air.blogspot.com 217
http://mintgene.wordpress.com 212
http://moderntoolmaking.blogspot.com 205
http://quantitativeecology.blogspot.com 199
http://www.sigmafield.org 199
http://www.ancienteco.com 194
http://worldofrcraft.blogspot.com 191
http://rappster.wordpress.com 190
http://stotastic.com 189
http://evolvingspaces.blogspot.com 184
http://strugglingthroughproblems.blogspot.com 166
http://sharpstatistics.co.uk 161
http://leftcensored.skepsi.net 160
http://omegahat.wordpress.com 156
http://drunks-and-lampposts.com 155
http://amathew.com 152
http://onlinelabor.blogspot.com 147
http://johnramey.net 144
http://gossetsstudent.wordpress.com 138
http://tomhopper.wordpress.com 135
http://ggobi.blogspot.com 134
http://blog.fellstat.com 131
http://www.openanalytics.eu 130
http://www.numbertheory.nl 127
http://stats.blogoverflow.com 127
http://the-praise-of-insects.blogspot.com 122
http://lpenz.github.com 118
http://christophergandrud.blogspot.com 118
http://f.giorlando.org 112
http://bayesianbiologist.com 110
http://www.graphoftheweek.org 109
http://oneliner.soma20.com 109
http://inundata.org 107
http://geokook.wordpress.com 104
http://blog.datapunks.com 102
http://eranraviv.com 102
http://eranraviv.com 102
http://www.compbiome.com 101
http://www.techpolicy.ca 99
http://www.psychwire.co.uk 97
http://blog.carlislerainey.com 93
http://vasishth-statistics.blogspot.com 93
http://www.statsravingmad.com 93
http://using-r-project.blogspot.com 93
http://www.nikhilgopal.com 92
http://thedatamonkey.blogspot.com 92
http://jeffreyhorner.tumblr.com 90
http://menugget.blogspot.com 88
http://www.twotorials.com 88
http://dataexcursions.wordpress.com 84
http://viksalgorithms.blogspot.com 83
http://exploringdatablog.blogspot.com 81
http://sachaepskamp.com 81
http://aphysicistinwallstreet.blogspot.com 77
http://lastresortsoftware.blogspot.com 75
http://www.nomad.priv.at 72
http://applyr.blogspot.com 71
http://www.knowledgediscovery.jp 71
http://weitaiyun.blogspot.com 71
http://xmphforex.wordpress.com 71
http://statsadventure.blogspot.com 70
http://davenportspatialanalytics.squarespace.com 70
http://anandram.wordpress.com 69
http://rpint.wordpress.com 68
http://datadebrief.blogspot.com 66
http://blog.cloudstat.org 64
http://www.r-podcast.org 64
http://rmkrug.wordpress.com 62
http://denishaine.wordpress.com 61
http://expansed.com 58
http://r.andrewredd.us 57
http://isseing333.blogspot.com 57
http://solomonmessing.wordpress.com 57
http://rtricks.wordpress.com 57
http://anrprogrammer.wordpress.com 56
http://arungaikwad.wordpress.com 56
http://geolabs.wordpress.com 55
http://lookingatdata.blogspot.com 55
http://factbased.blogspot.com 54
http://severity.blogspot.com 54
http://swordofcrom.wordpress.com 53
http://librestats.wordpress.com 51
http://marcinkula.wordpress.com 51
http://gsoc2010r.wordpress.com 47
http://psyccomputing.blogspot.com 46
http://fabiomarroni.wordpress.com 45
http://jedifran.com 45
http://alstatr.blogspot.com 43
http://r-video-tutorial.blogspot.com 42
http://alexfarquhar.posterous.com 40
http://bmb-common.blogspot.com 40
http://rdataviz.wordpress.com 40
http://mypapertrades.blogspot.com 38
http://pitchrx.blogspot.com 38
http://simonmueller.net 38
http://statisfactions.wordpress.com 37
http://nzprimarysectortrade.wordpress.com 36
http://seanmulcahy.blogspot.com 36
http://www.speakingstatistically.com 35
http://joshpaulson.wordpress.com 34
http://learningrbasic.blogspot.com 34
http://mockquant.blogspot.com 33
http://costaleconomist.blogspot.com 32
http://rsnippets.blogspot.com 31
http://statmethods.wordpress.com 29
http://aviadklein.wordpress.com 28
http://obeautifulcode.com 28
http://blog.cloudst.at 24
http://rstats.posterous.com 23
http://notebookonthewebs.tumblr.com 22
http://0utlier.blogspot.com 21
http://gjkerns.github.com 21
http://eigensomething.blogspot.com 10
http://brocktibert.wordpress.com 9
http://toddjobe.blogspot.com 9
http://mickeymousemodels.blogspot.com 9
http://forgetfulfunctor.blogspot.com 9
http://rocknrblog.wordpress.com 9
http://dmbates.blogspot.com 8
http://blog.nextbiomotif.com 8
http://indiacrunchin.wordpress.com 8
http://blog.trenthauck.com 8
http://mikescnc.blogspot.com 8
http://jeroldhaas.blogspot.com 8
http://tlevine.tumblr.com 8
http://empty-moon-9726.heroku.com 8
http://www.proc-x.com 7
http://jointposterior.blogspot.com 7
http://gastonsanchez.wordpress.com 7
http://mlt-thinks.blogspot.com 7
http://rstats.wordpress.com 7
http://playingwithr.blogspot.com 7
http://scottmutchler.blogspot.com 6
http://iamdata.wordpress.com 6
http://sfchaos.blogspot.com 6
http://nightlordtw.wordpress.com 5
http://pleasepasstheroc.blogspot.com 5
http://wiekvoet.blogspot.com 5
http://d7.stattler.com 4
http://yetanotherrblog.blogspot.com 4
http://blog.iwanluijks.nl:80 3
https://rlearner.wordpress.com 3
http://margintale.blogspot.com 1

When checking the results manually I discovered slight deviations in the numbers and admittedly have no clue why this is.. Sorry if any blog is under- overrepresented due to such an error – please report!

Here is the R-script:

require(XML)
library(stringr)
library(RCurl)
library(xtable)

GoogleHits.1 <- function(input)
   {
    url <- paste("https://www.google.com/search?q=\"",
                 input, "\"", sep = "")
 
    CAINFO = paste(system.file(package="RCurl"), "/CurlSSL/ca-bundle.crt", sep = "")
    script <- getURL(url, followlocation = TRUE, cainfo = CAINFO)
    doc <- htmlParse(script)
    res <- xpathSApply(doc, "//div[@id='subform_ctrl']/*", xmlValue)[[2]]
    return(as.integer(gsub("[^0-9]", "", res)))
   }

# Example:
GoogleHits.1("R%Statistical%Software")

###################### Begin get r-blogger's URLs: ###########################################
# get blogger urls with XML:
script <- getURL("www.r-bloggers.com")
doc <- htmlParse(script)
li <- getNodeSet(doc, "//ul[@class='xoxo blogroll']//a")
urls <- sapply(li, xmlGetAttr, "href")

# extract sensible blog urls:
# get ids for those with only 2 slashes (no 3rd in the end):
id <- which(nchar(gsub("[^/]", "", urls )) == 2)
slash_2 <- urls[id]

# find position of 3rd slash occurrence in strings:
slash_stop <- unlist(lapply(str_locate_all(urls, "/"),"[[", 3))
slash_3 <- substring(urls, first = 1, last = slash_stop - 1)

# replace the ones with 2 slashes:
blogs <- slash_3; blogs[id] <- slash_2

# dismiss:
blogs <- blogs[blogs != "http://domain"]
###################### End get r-blogger's URLs: #############################

###################### Begin Google Search: ##################################
# with lapply google mocks about roboting the site..
# I'm blocked on the 300th recursion..
# unlist(lapply(blogs, GoogleHits.1))

# try splitting, doesn't work (blocked the same as before)
res1 <- unlist(lapply(blogs[1:170], GoogleHits.1))
res2 <- unlist(lapply(blogs[171:334], GoogleHits.1))

# try to do it in 2 sessions (saving first result), or manually re-connnect host before second try:
df1 <- data.frame(Blogs = blogs[1:170], NoHits = res1, row.names = NULL)
save(df1, file = "df1.R")
load("df1.RData"); unlink("df1.RData")

# second run:
df2 <- data.frame(Blogs = blogs[171:334], NoHits = res2, row.names = NULL)

# bind dfs, sort by NoHits:
finres <- as.data.frame(rbind(df1, df2)); finres$Blogs <- as.character(finres$Blogs)
(finres <- finres[order(finres$NoHits, decreasing = T), ])

htmltab <- xtable(finres)
print(htmltab, type = "html", include.rownames=FALSE, file = "Bloggers.Google.Hits.htm")
###################### End Google Search #####################################

###################### Begin Plot: ###########################################
pdf("RBloggersWebPresence.pdf")
par(mar = c(4.5, 4.5, 3, 2), ylog = F)
plot(finres$NoHits, cex = 0.5, col = 3, 
     ylab = "No. of Hits in Google Search",
     xlab = "Blogs", log = "y")
set.seed(19)
rid <- sample(13:nrow(finres), 15)
text(x = rid, y = finres$NoHits[rid], 
     labels = finres$Blogs[rid],
     cex = 0.75, srt = 90, pos = 4, offset = -1) 
title(main = "R-Bloggers' Web Presence")
dev.off()
###################### End Plot ##############################################

To leave a comment for the author, please follow the link and comment on their blog: theBioBucket*.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)