GitHub Stats on Programming Languages
[This article was first published on R-Chart, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
GitHub has become a popular site for Open Source Developers to stash code and collaborate on projects. The following are some stats and analysis related to programming languages in use based upon the number of users and repositories. The data was obtained from GitHub’s searches. It and the R code are available in GitHub as well (a lovely recursive relationship I must say).Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
> df.top_ten_reps
Language Repositories Users
1 Ruby 104239 23123
2 JavaScript 44482 10895
3 Perl 34232 2178
4 Python 32150 8775
5 PHP 21685 8872
6 Java 17687 6618
7 C 16137 5558
8 C++ 12521 5595
9 Objective-C 8027 2520
10 C# 6061 2706
Ruby has a commanding lead in terms of the number of repositories with 32.17% – more than the next two (Javascript and Perl) combined. R is ranked 25th with 191 repositories or about 0.06% and only 6 projects behind the D programming language. The top 5 are scripting languages, Java ranks 6th and the C family rounds out the top ten. Relatively open languages lead the pack, followed by those with a proprietary focus (Objective-C for Apple and C# for Microsoft).
When ranked by number of users, the top two remain the same. There is a bit of shuffling with the remainder of the top ten.
> df.top_ten_users
Language Repositories Users
1 Ruby 104239 23123
2 JavaScript 44482 10895
3 PHP 21685 8872
4 Python 32150 8775
5 Java 17687 6618
6 C++ 12521 5595
7 C 16137 5558
8 C# 6061 2706
9 Objective-C 8027 2520
10 Perl 34232 2178
The most striking is that Perl drops to 9th place. There are significantly less users associated with Perl – particularly for the number of projects. I noticed that there was a migration of Perl language source code to GitHub – so perhaps modules were migrated as well…but I couldn’t find any specific announcements that clarified this.
All other things being equal, you might expect there to be a relationship between of the number of users to repositories. There is to some degree –
> df.Ratio=df.Repositories / df.Users
> mean(df[df$Ratio > 0 & df$User > 0 & !is.na(df$Ratio), ])
5.064732
A linear model suggests a slightly lower value (between 4 and 5).
Here is a plot restricted to the Top 10.
Some of the lesser used languages have few users and more repositories like IO (19 per user) and CoffeeScript (17 per user). Perl has a remarkable 15 per user.
Full data Set
Language Repositories Users Rep.pct Ratio
1 Ruby 104239 23123 32.17 4.508022
2 JavaScript 44482 10895 13.73 4.082790
3 Perl 34232 2178 10.56 15.717172
4 Python 32150 8775 9.92 3.663818
5 PHP 21685 8872 6.69 2.444206
6 Java 17687 6618 5.46 2.672560
7 C 16137 5558 4.98 2.903383
8 C++ 12521 5595 3.86 2.237891
9 Objective-C 8027 2520 2.48 3.185317
10 C# 6061 2706 1.87 2.239837
11 Shell 4657 1011 1.44 4.606330
12 VimL 4248 1267 1.31 3.352802
13 ActionScript 2609 1104 0.81 2.363225
14 Erlang 2520 532 0.78 4.736842
15 Haskell 2290 641 0.71 3.572543
16 Scala 2154 539 0.66 3.996289
17 Clojure 2082 481 0.64 4.328482
18 Lua 1754 511 0.54 3.432485
19 Groovy 870 261 0.27 3.333333
20 Scheme 707 140 0.22 5.050000
21 Go 398 103 0.12 3.864078
22 OCaml 382 121 0.12 3.157025
23 Objective-J 355 109 0.11 3.256881
24 D 197 64 0.06 3.078125
25 R 191 69 0.06 2.768116
26 ColdFusion 180 56 0.06 3.214286
27 Tcl 125 39 0.04 3.205128
28 ooc 112 11 0.03 10.181818
29 FORTRAN 93 47 0.03 1.978723
30 ASP 88 35 0.03 2.514286
31 Smalltalk 80 14 0.02 5.714286
32 HaXe 75 14 0.02 5.357143
33 F# 74 5 0.02 14.800000
34 Verilog 74 26 0.02 2.846154
35 VHDL 64 14 0.02 4.571429
36 Io 57 3 0.02 19.000000
37 SuperCollider 53 11 0.02 4.818182
38 Arc 48 15 0.01 3.200000
39 Delphi 43 16 0.01 2.687500
40 Assembly 41 5 0.01 8.200000
41 Boo 41 6 0.01 6.833333
42 Nu 40 4 0.01 10.000000
43 Eiffel 39 15 0.01 2.600000
44 CoffeeScript 34 2 0.01 17.000000
45 Vala 27 3 0.01 9.000000
46 Racket 20 8 0.01 2.500000
47 Self 7 3 0.00 2.333333
48 Duby 4 0 0.00 Inf
49 Max/MSP 4 2 0.00 2.000000
50 sclang 2 0 0.00 Inf
51 Common Lisp 0 0 0.00 NaN
52 Emacs Lisp 0 0 0.00 NaN
53 Pure Data 0 0 0.00 NaN
54 Visual Basic 0 0 0.00 NaN
To leave a comment for the author, please follow the link and comment on their blog: R-Chart.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.