Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Among DESCRIPTION usual fields is the free-text URL field where package authors can store various links: to the development website, docs, upstream tool, etc. In this post, we shall explain why storing URLs in DESCRIPTION is important, where else you should add URLs and what kind of URLs are stored in CRAN packages these days.
Why put URLs in DESCRIPTION?
In the following we’ll assume your package has some sort of online development repository (GitHub? GitLab? R-Forge?) and a documentation website (handily created via pkgdown?). Adding URLs to your package’s online homes is extremely useful for several reasons.
As a side note: Yes, you can store several URLs under URL, even if the field name is singular. See for instance
rhub’s DESCRIPTION ???? ????
URL: https://github.com/r-hub/rhub, https://r-hub.github.io/rhub/
Why put URLs in DESCRIPTION?
- It will help your users find your package’s pretty documentation from the CRAN page, instead of just the less pretty PDF manual. 
- Likewise, from the CRAN page your contributors can directly find where to submit patches. 
- If your package has a package-level man page, and it should (e.g. as drafted by - usethis::use_package_doc()and then generated by- roxygen2), then after typing say- library("rhub")and then- ?rhub, your users will find the useful links.
- Other tools such as - helpdeskand the- pkgsearchRStudio addin can help surface the URLs you store in DESCRIPTION.
- Indirectly, having a link to the docs website and development repo will increase their page rank, see useful comments in this discussion, so potential users and contributors find them more easily by simply searching for your package. 
Quick tip, you can add GitHub URLs (URL and BugReports) to DESCRIPTION by running
usethis::use_github_links(). ????
Where else put your URLs?
For the same reasons as previously, you should make the most of all places that can store your package’s URL(s). Have you put your package’s docs URL
- in the pkgdown config file, if that’s how you built it? 
- in the GitHub repo website field (you need admin rights), or the equivalent for your development platform, e.g. GitLab? 
Have you used any of your package’s URLs
- In your public message about your package, e.g. as an answer to someone’s question? 
- In the slides of your talk about the package? 
Don’t miss any opportunity to point users and contributors in the right direction!
What URLs do people use in DESCRIPTION files of CRAN packages?
In the following, we shall parse the URL field of the CRAN packages database.
db <- tools::CRAN_package_db()
db <- tibble::as_tibble(db[, c("Package", "URL")])
db <- dplyr::distinct(db)
There are 15315 packages on CRAN at the time of writing, among which 8040 with something written in the URL field. We can parse this data.
db <- db[!is.na(db$URL),]
library("magrittr")
# function from https://github.com/r-hub/pkgsearch/blob/26c4cc24b9296135b6238adc7631bc5250509486/R/addin.R#L490-L496
url_regex <- function() "(https?://[^\\s,;>]+)"
find_urls <- function(txt) {
  mch <- gregexpr(url_regex(), txt, perl = TRUE)
  res <- regmatches(txt, mch)[[1]]
  if(length(res) == 0) {
    return(list(NULL))
  } else {
    list(unique(res))
  }
}
db %>%
  dplyr::group_by(Package)  %>%
  dplyr::mutate(actual_url = find_urls(URL))%>%
  dplyr::ungroup() %>%
  tidyr::unnest(actual_url) %>%
  dplyr::group_by(Package, actual_url) %>%
  dplyr::mutate(url_parts = list(urltools::url_parse(actual_url))) %>%
  dplyr::ungroup() %>%
  tidyr::unnest(url_parts) %>%
  dplyr::mutate(scheme = trimws(scheme)) -> parsed_db
There are 7192 with at least one valid URL.
What are the packages with most links?
mostlinks <- dplyr::count(parsed_db, Package, sort = TRUE) mostlinks ## # A tibble: 7,192 x 2 ## Package n ## <chr> <int> ## 1 RcppAlgos 7 ## 2 BIFIEsurvey 5 ## 3 BigQuic 5 ## 4 dendextend 5 ## 5 PGRdup 5 ## 6 vwline 5 ## 7 ammistability 4 ## 8 augmentedRCBD 4 ## 9 dcGOR 4 ## 10 dialr 4 ## # … with 7,182 more rows
The package with the most links in URL is RcppAlgos.
What is the most popular scheme, http or https?
dplyr::count(parsed_db, scheme, sort = TRUE) ## # A tibble: 2 x 2 ## scheme n ## <chr> <int> ## 1 https 5910 ## 2 http 2496
There is a bit less that one third of http links.
Can we identify popular domains?
dplyr::count(parsed_db, domain, sort = TRUE) ## # A tibble: 1,855 x 2 ## domain n ## <chr> <int> ## 1 github.com 4660 ## 2 www.r-project.org 164 ## 3 cran.r-project.org 143 ## 4 r-forge.r-project.org 82 ## 5 bitbucket.org 67 ## 6 sites.google.com 54 ## 7 arxiv.org 52 ## 8 gitlab.com 44 ## 9 docs.ropensci.org 38 ## 10 www.github.com 32 ## # … with 1,845 more rows
GitHub seems to be the most popular development platform, as least from this sample of CRAN packages that indicate an URL. It is also possible that some developers set up their own GitLab server with a own domain.
Many packages link to www.r-project.org which is not very informative, or to their own CRAN page which can be informative.
Other relatively popular domains are sites.google.com and arxiv.org. There are problably links to other venues for scientific publications than arxiv.org. What about doi.org?
dplyr::filter(parsed_db, domain %in% c("doi.org", "dx.doi.org")) %>%
  dplyr::select(Package, actual_url)
## # A tibble: 44 x 2
##    Package                actual_url                                    
##    <chr>                  <chr>                                         
##  1 abcrlda                https://dx.doi.org/10.1109/LSP.2019.2918485   
##  2 adwave                 https://doi.org/10.1534/genetics.115.176842   
##  3 ammistability          https://doi.org/10.5281/zenodo.1344756        
##  4 anMC                   https://doi.org/10.1080/10618600.2017.1360781 
##  5 ANOVAreplication       https://dx.doi.org/10.17605/OSF.IO/6H8X3      
##  6 AssocAFC               https://doi.org/10.1093/bib/bbx107            
##  7 augmentedRCBD          https://doi.org/10.5281/zenodo.1310011        
##  8 CorrectOverloadedPeaks http://dx.doi.org/10.1021/acs.analchem.6b02515
##  9 dataMaid               https://doi.org/10.18637/jss.v090.i06         
## 10 disclapmix             http://dx.doi.org/10.1016/j.jtbi.2013.03.009  
## # … with 34 more rows
The “earlier but no longer preferred” dx.doi.org is still in use.
rOpenSci docs server also make an appearance.
Note that you could do a similar analysis of the BugReports field. We’ll leave that as an exercise to the reader. ????
Conclusion
In this note, we explained why having URLs in DESCRIPTION of your package can help users and contributors find the right venues for their needs, and we had a look at URLs currently stored in the DESCRIPTIONs of CRAN packages, in particular discussing current popular domains. How do you ensure the users of your package can find its best online home(s)? How do you look for online home(s) of the packages you use?
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
