Create Code Metrics with cloc
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The cloc
Perl script (yes, Perl!) by Al Danial (https://github.com/AlDanial/cloc) has been one of the go-to tools for generating code metrics. Given a single file, directory tree, archive, or git repo, cloc
can speedily give you metrics on the count of blank lines, comment lines, and physical lines of source code in a vast array of programming languages.
I don’t remember the full context but someone in the R community asked about about this type of functionality and I had tossed together a small script-turned-package to thinly wrap the Perl cloc
utility. Said package was and is unimaginatively named cloc
. Thanks to some collaborative input from @ma_salmon, the package gained more features. Recently I added the ability to process R markdown (Rmd
) files (i.e. only count lines in code chunks) to the main cloc
Perl script and was performing some general cleanup when the idea to create some RStudio addins hit me.
cloc
Basics
As noted, you can cloc
just about anything. Here’s some metrics for dplyr::group_by
:
cloc("https://raw.githubusercontent.com/tidyverse/dplyr/master/R/group-by.r") ## # A tibble: 1 x 10 ## source language file_count file_count_pct loc loc_pct blank_lines blank_line_pct comment_lines comment_line_pct ## 1 group… R 1 1. 44 1. 13 1. 110 1.
and, here’s a similar set of metrics for the whole dplyr
package:
cloc_cran("dplyr") ## # A tibble: 7 x 11 ## source language file_count file_count_pct loc loc_pct blank_lines blank_line_pct comment_lines comment_line_pct ## 1 dplyr… R 148 0.454 13216 0.442 2671 0.380 3876 0.673 ## 2 dplyr… C/C++ H… 125 0.383 6687 0.223 1836 0.261 267 0.0464 ## 3 dplyr… C++ 33 0.101 4724 0.158 915 0.130 336 0.0583 ## 4 dplyr… HTML 11 0.0337 3602 0.120 367 0.0522 11 0.00191 ## 5 dplyr… Markdown 2 0.00613 1251 0.0418 619 0.0880 0 0. ## 6 dplyr… Rmd 6 0.0184 421 0.0141 622 0.0884 1270 0.220 ## 7 dplyr… C 1 0.00307 30 0.00100 7 0.000995 0 0. ## # ... with 1 more variable: pkg
We can also measure (in bulk) from afar, such as the measuring the dplyr
git repo:
cloc_git("git://github.com/tidyverse/dplyr.git") ## # A tibble: 12 x 10 ## source language file_count file_count_pct loc loc_pct blank_lines blank_line_pct comment_lines ## 1 dplyr.git HTML 108 0.236 21467 0.335 3829 0.270 1114 ## 2 dplyr.git R 156 0.341 13648 0.213 2682 0.189 3736 ## 3 dplyr.git Markdown 12 0.0263 10100 0.158 3012 0.212 0 ## 4 dplyr.git C/C++ Header 126 0.276 6891 0.107 1883 0.133 271 ## 5 dplyr.git CSS 2 0.00438 5684 0.0887 1009 0.0711 39 ## 6 dplyr.git C++ 33 0.0722 5267 0.0821 1056 0.0744 393 ## 7 dplyr.git Rmd 7 0.0153 447 0.00697 647 0.0456 1309 ## 8 dplyr.git XML 1 0.00219 291 0.00454 0 0. 0 ## 9 dplyr.git YAML 6 0.0131 212 0.00331 35 0.00247 12 ## 10 dplyr.git JavaScript 2 0.00438 44 0.000686 10 0.000705 4 ## 11 dplyr.git Bourne Shell 3 0.00656 34 0.000530 15 0.00106 10 ## 12 dplyr.git C 1 0.00219 30 0.000468 7 0.000493 0 ## # ... with 1 more variable: comment_line_pct
All in on Addins
The Rmd functionality made me realize that some interactive capabilities might be handy, so I threw together three of them.
Two of them extraction of code chunks from Rmd documents. One uses cloc
other uses knitr::purl()
(h/t @yoniceedee). The knitr
one adds in some very nice functionality if you want to preserve chunk options and have “eval=FALSE
” chunks commented out.
The final one will gather up code metrics for all the sources in an active project.
FIN
If you’d like additional features or want to contribute, give (https://github.com/hrbrmstr/cloc) a visit and drop an issue or PR.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.