Licensing R packages with code and data: learnings from submitting to CRAN

[This article was first published on Epiverse-TRACE developer space, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Note

This is a follow-up blog post to the Dual licensing R packages with code and data post published in September 2024. It contains learnings from submitting epiparameter to CRAN with a dual license.

Overview of previous blog post on R package licensing

We previously published a post on the Epiverse-TRACE blog discussing the importance of licensing code and data for open source. It covered the differences between open source licenses for code and those that are required for other forms of information, in our case, data. R packages most commonly distribute code, and therefore require an appropriate open source license. Less frequently, R packages are used to bundle and share data, so-called “data packages”, and require an applicable license for reuse and redistribution. We recommend reading the original blog post for a more in-depth explanation of each of these points.

However, since publishing the original post, the last section, Licensing code and data in one R package, has become outdated given our experience submitting the epiparameter package to CRAN with a dual-license. This blog post provides an updated Licensing code and data in one R package section with learnings from epiparameter package development and CRAN submission.

Licensing code and data in one R package

If you are developing an R package that has both code and data as primary objects of (roughly) equal importance, a software license inadequately covers the data, and a data license inadequately covers the code. Dual licensing can help resolve this issue. This means there is one license for code (for example, MIT license) and another license for the included data (for example, Public Domain Dedication).

After conducting an online search, dual licensing for R packages seems rare. An interesting example of dual licensing is the igraphdata package, which contains several licenses: One for each dataset included in the package. Similar to igraphdata, we dual licensed the epiparameter package, which until versus v0.4.0 contained both code and data. We licensed the code using the DESCRIPTION file and used the LICENSE file to license the data under CC0. Concretely, we included this additional text in LICENSE to clarify the dual license and that we recommend citing the original source regardless:

All data included in the epiparameter R package is licensed under CC0 (https://creativecommons.org/publicdomain/zero/1.0/legalcode.txt). This includes the parameter database (extdata/parameters.json) and data in the data/ folder. Please cite the individual parameter entries in the database when used.

However, upon submission of epiparameter to CRAN, the dual licensing approach was rejected. The CRAN package reviewer stated:

A package can only be licensed as a whole. It can have a single license or a set of alternative licenes. If the data have to be licensed differently then the code, you have to provide the data in a separate data package with the other license.

Therefore, we decided to separate the data, originally stored in epiparameter, into a package called epiparameterDB. This package is solely licensed under CC0, and epiparameter can then also be solely licensed under MIT. Both packages are now hosted on CRAN using a single license dedicated to code or data.

Tip

It is often desirable to host an R package on CRAN as it enables it to be easily installed (from binary if on Mac or Windows), it gives the package some validity as a non-trivial and secure piece of software to install and use.

It is not necessary nor beneficial for all R packages to be hosted on CRAN, and it does come with some drawbacks, such as the dual licensing restriction of code and data, but for our case with the epiparameter package, we deemed it better to host on CRAN and split the code and data into epiparameter and epiparameterDB, respectively.

When including data in your R package from other sources it is important to check that the license of your package and the data is compatible1, or that the individual data license is clearly stated, as in igraphdata. For epiparameterDB, we consider model estimates as facts (that is, not copyrightable).


This blogpost provides an addendum to our original post Dual licensing R packages with code and data, providing our experience and outcomes from submitting epiparameter – and the resulting epiparmeterDB package – to CRAN in a conformant manner. In addition to the requirements and benefits of open source licensing of open software and data in the first post, we hope this follow-up post provides practical information that will be of use to other R package developers, or software developers and open data curators more broadly.

Footnotes

  1. See this blog post by Julia Silge on including external data sets into an R package and rectifying incompatibilities with license↩︎

Reuse

Citation

BibTeX citation:
@online{lambert2025,
  author = {Lambert, Joshua and Hartgerink, Chris},
  title = {Licensing {R} Packages with Code and Data: Learnings from
    Submitting to {CRAN}},
  date = {2025-01-20},
  url = {https://epiverse-trace.github.io/posts/data-licensing-cran.html},
  langid = {en}
}
For attribution, please cite this work as:
Lambert, Joshua, and Chris Hartgerink. 2025. “Licensing R Packages with Code and Data: Learnings from Submitting to CRAN.” January 20, 2025. https://epiverse-trace.github.io/posts/data-licensing-cran.html.
To leave a comment for the author, please follow the link and comment on their blog: Epiverse-TRACE developer space.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)