Site icon R-bloggers

Google AI Challenge: Scores/Rank by Language

[This article was first published on R-Chart, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A quick follow up to the previous post: about the the scores in the 2010 Google AI competition relative to programming language.  The chart above makes each language visible and discrete – and the scales are the same.

library(ggplot2)
df<- read.csv(‘googleAI2010.csv’,sep=’;’,header=FALSE)
df$V7 <- NULL
names(df)<- c(‘rank’, ‘username’,’country’,’organization’,’language’,’elo_score’)


ggplot(data=df, aes(x=rank, y=elo_score, color=language)) + 
+ geom_point(size=1) + 
+ facet_wrap(~ language) + opts(title=’Google AI 2010: Score by Rank for each Language’)

It is based upon a simple comparison of rank and score.




df<- read.csv(‘googleAI2010.csv’,sep=’;’,header=FALSE)
df$V7 <- NULL
names(df)<- c(‘rank’, ‘username’,’country’,’organization’,’language’,’elo_score’)

ggplot(data=df, aes(x=rank, y=elo_score)) + geom_point(size=1) + opts(title=’Google AI Score by Rank’)


Another approach to viewing this information is a histogram by score (which ignores rank).  With a binwidth of 100 (and ignoring low scores of people who signed up but who dropped out relatively early) a (nearly) bimodal distribution appears.

qplot(data=df, x=elo_score, geom=’histogram’, binwidth=100)


Any ideas about why this is not normal?  Is there some aspect of ELO scoring that leads to this shape?  Or are there different types of programmers represented?

This can be broken down by language.  To avoid difficulty distinguishing colors, the rainbow palette is used and a few languages are not reported (since they were not highly represented in the competition).

library(sqldf)

df2=sqldf(“select * from df where language not in (‘Groovy’,’Scala’,’Go’,’OCaml’)”)
df2$language=factor(df2$language)
qplot(data=df2, x=elo_score, fill=language, geom=’histogram’, binwidth=100) + scale_fill_manual(values=rainbow(12)) 



As mentioned in the previous post, the data is available at GitHub – feel free to post some of your own visualizations of this data.

To leave a comment for the author, please follow the link and comment on their blog: R-Chart.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.