Where is the R Activity?
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R has become one of the world’s most widely used statistics and visualisation software packages with an ever growing user community. Thanks to the release of log files containing all hits to http://cran.rstudio.com/ server it is possible to make a map showing the parts of the world with the most active R users (specifically those mostly using the RStudio interface). The USA comes top with 3,045,960 requests to the server between October 2012 and June 2013. Japan is in 2nd place with a mere 756,177 requests and Germany 3rd. In all 203 countries appear in the server logs. I have scaled the map according to the number of server requests made and you can clearly see the dominance of Japan, Europe and North America compared with other parts of the world, especially Africa. The map of course isn’t a perfect representation of the number of R users, as you could have one or two people making hundreds of server requests a day versus a large number of people only making a couple. This is why I have entitled the map “Activity” rather than “Users”. Either way R hasn’t quite achieved global domination but it is getting there…
To create the map I obtained the files following the instructions on the logs download page. I then combined them with the following code (take from here):
setwd("XXX") #this needs to be the directory with the downloaded files in it.
file_list <- list.files()
for (file in file_list){
# if the merged dataset doesn't exist, create it
if (!exists("dataset")){
dataset <- read.csv(file, header=TRUE)
}
# if the merged dataset does exist, append to it
if (exists("dataset")){
temp_dataset <- read.csv(file, header=TRUE)
dataset<-rbind(dataset, temp_dataset)
rm(temp_dataset)
}
print(file)
}
It is then possible to aggregate the data to get the number of requests per country.
dataset$flag<- 1
counts<- aggregate(dataset$flag, by=list(dataset$country), sum)
names(counts)<- c("country", "count")
The next step was to download a world shapefile (containing the country borders) from Natural Earth. This contains the country codes used in the log file (the dataset object above). We can open this file with the maptools package:
library(maptools)
world<-readShapePoly("yourworldshapefile")
It is then possible to join our counts object to the world object to assign the log counts to each country based on the "iso_a2" and "country" fields respectively. The new shapefile is also saved.
world@data = data.frame(world@data, counts[match(world@data[,"iso_a2"], counts[,"country"]),])
writePolyShape(world, "world_r_use.shp")
This next bit is a bit of a cheat as I used the ScapeToad software to create the cartogram. A package exists to do this in R but I find ScapeToad to be more powerful. You can download the shapefile I produced from here. I have then reloaded the new shapefile into R and used the basic plot functions to produce the map.
cartogram<-readShapePoly("world_r_carto.shp")
plot(cartogram)
title(main="R Activity Around the World", sub="Based on cran.rstudio.com Activity Logs October 2012-June 2013")
This is my first stab at looking at the data - there is a lot more that can be done with it!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.