Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A new R package, cffr, has been developed, peer-reviewed by rOpenSci and accepted by CRAN. This package has a single purpose: to create a valid CITATION.cff file using the metadata of any R package.
CITATION.cff files and why they matter
A Citation File Format (CFF) is a plain text file with human- and machine-readable citation information for software (and datasets)1.
Under the hood, a CFF file is a YAML file. YAML has the advantage of being easily understood by any user, and can also be easily converted to another data serialization language, such as JSON or XML. This is an example of the minimal content of a valid CITATION.cff file:
cff-version: 1.2.0 message: 'To cite package "cffr" in publications use:' title: 'cffr: Generate Citation File Format (''cff'') Metadata for R Packages' authors: - family-names: Hernangómez given-names: Diego
In this example, the identification of the software and the author is quite
straightforward, as it is provided by the fields title
and authors
. The
information that can be included on a CFF file can be further enriched with
additional fields (like version
, year
or doi
), as the Citation File
Format schema version
1.2.0
accepts 21 different keys.
Why do CFF files matter?
Citing a book, an article, or a thesis is not difficult. The title, authors and publication date are easily identifiable in most of the cases. However, software is rarely cited on research projects. One of the reasons is “the lack of a clear citation information from package developers”, as already mentioned in a previous post (Make Your R Package Easier to Cite). Developers spend thousand of hours on developing new and exciting software or adding new features to existing ones, so citing software is a matter of giving credit where credit is due. For more reasons why it is important to cite R software see Steffi LaZerte’s blog post How to Cite R and R Packages.
In July 2021, GitHub announced a built-in citation feature that enables any software user to cite any repository in APA or BibTeX style.
< !--html_preserve-->< !--/html_preserve-->We’ve just added built-in citation support to GitHub so researchers and scientists can more easily receive acknowledgments for their contributions to software.
— Nat Friedman (@natfriedman) July 27, 2021
Just push a CITATION.cff file and we’ll add a handy widget to the repo sidebar for you.
Enjoy! ? pic.twitter.com/L85MS5pY2Y
This built-in feature heavily relies on the CFF format, rendering the information of the CITATION.cff file into the aforementioned styles.
This announcement was, in my very personal opinion, a game-changer for the software citation ecosystem. As a proof of that, on the two following days Zenodo and Zotero announced support for CITATION.cff files in their GitHub integration:
< !--html_preserve-->We’ve just added support for CITATION.cff files in our GitHub integration.
— Zenodo (@ZENODO_ORG) July 28, 2021
Just push a CITATION.cff file in your repo and we’ll use it when registering the DOI via the@ZENODO_ORG @github integration https://t.co/EBnM7F6eT4.
Enjoy! ?
< !--/html_preserve-->We’ve added support for GitHub’s new citation feature. When saving GitHub repos to your library, Zotero can now use the enhanced metadata provided by developers.
— Zotero (@zotero) July 28, 2021
If there’s no citation file, Zotero will continue to use the existing repo metadata (Company, Prog. Language, etc.). https://t.co/Q34zPBRGFj
Integration with Zenodo means that when creating a Digital Object Identifier (DOI) for a GitHub repository via Zenodo, the DOI would be generated according to the metadata included in the CITATION.cff file of the repository. This feature saves developers the extra effort of making both DOI and software consistent in terms of metadata. See an example of the DOI of cffr, whose title, description and author has been gathered from the cffr CITATION.cff file.
< !--html_preserve-->And there is still more! CiteAs service also supports CFF files2, and in the future more platforms such as JabRef or GitLab3 may add support to CITATION.cff files, (and why not CRAN or BioConductor?).
Other software citation projects
The CodeMeta Project4 creates a concept vocabulary that can be used to standardize the exchange of software metadata across repositories and organizations. One of the many uses of a codemeta.json file is to provide citation metadata such as title, authors, publication year or version. The codemetar package5 allows you to generate codemeta.json files from R package metadata.
Using cffr
Getting started with cffr is pretty easy. There is a main function (likely the
only one you would need for an in-development package) named
cff_write()
that
extracts the metadata of your package (already included in your DESCRIPTION and
inst/CITATION files), converts it into a CITATION.cff file and validates it
against the latest CFF validation
schema
using jsonvalidate6:
library(cffr) # For in-development packages cff_write() #> #> CITATION.cff generated #> #> cff_validate results----- #> Congratulations! This .cff file is valid
Working with cff
objects
It is also possible to create a cff
object (a regular R list with a custom
printing method) for any package installed locally on your machine. In the next
example I create a cff
object for the
rtweet7 package:
library(cffr) cff_rtweet <- cff_create("rtweet") cff_rtweet #> cff-version: 1.2.0 #> message: 'To cite package "rtweet" in publications use:' #> type: software #> license: MIT #> title: 'rtweet: Collecting Twitter Data' #> version: 0.7.0 #> doi: 10.21105/joss.01829 #> abstract: 'An implementation of calls designed to collect and organize Twitter data #> via Twitter''s REST and stream Application Program Interfaces (API), which can be #> found at the following URL: <https://developer.twitter.com/en/docs>. This package #> has been peer-reviewed by rOpenSci (v. 0.6.9).' #> authors: #> - family-names: Kearney #> given-names: Michael W. #> email: kearneymw@missouri.edu #> orcid: https://orcid.org/0000-0002-0730-4694 #> preferred-citation: #> type: article #> title: 'rtweet: Collecting and analyzing Twitter data' #> authors: #> - family-names: Kearney #> given-names: Michael W. #> year: '2019' #> journal: Journal of Open Source Software #> volume: '4' #> number: '42' #> pages: '1829' #> doi: 10.21105/joss.01829 #> url: https://joss.theoj.org/papers/10.21105/joss.01829 #> repository: https://CRAN.R-project.org/package=rtweet #> repository-code: https://github.com/ropensci/rtweet #> url: https://CRAN.R-project.org/package=rtweet #> date-released: '2020-01-08' #> contact: #> - family-names: Kearney #> given-names: Michael W. #> email: kearneymw@missouri.edu #> orcid: https://orcid.org/0000-0002-0730-4694 #> keywords: #> - r #> - twitter
Note the special field,
preferred-citation
,
that would be used to generate the citation information on GitHub. If this field
is not present, GitHub would reuse other keys in the file to auto-generate a
citation reference.
As already mentioned, cffr uses information from the DESCRIPTION (via the desc8 package) and the inst/CITATION file to extract the metadata. I will focus now on comparing the citation info from rtweet and the information generated by cffr:
toBibtex(citation("rtweet")) #> @Article{rtweet-package, #> title = {rtweet: Collecting and analyzing Twitter data}, #> author = {Michael W. Kearney}, #> year = {2019}, #> note = {R package version 0.7.0}, #> journal = {Journal of Open Source Software}, #> volume = {4}, #> number = {42}, #> pages = {1829}, #> doi = {10.21105/joss.01829}, #> url = {https://joss.theoj.org/papers/10.21105/joss.01829}, #> } cff_rtweet$`preferred-citation` #> type: article #> title: 'rtweet: Collecting and analyzing Twitter data' #> authors: #> - family-names: Kearney #> given-names: Michael W. #> year: '2019' #> journal: Journal of Open Source Software #> volume: '4' #> number: '42' #> pages: '1829' #> doi: 10.21105/joss.01829 #> url: https://joss.theoj.org/papers/10.21105/joss.01829
We can check that the core information of the rtweet citation has been included
in the cff
object, and we can also check fields included in the DESCRIPTION
file of rtweet:
packageDescription("rtweet", fields = c( "Title", "Description", "Author", "Version", "URL" ) ) #> Title: Collecting Twitter Data #> Description: An implementation of calls designed to collect and organize Twitter #> data via Twitter's REST and stream Application Program Interfaces #> (API), which can be found at the following URL: #> <https://developer.twitter.com/en/docs>. This package has been #> peer-reviewed by rOpenSci (v. 0.6.9). #> Author: Michael W. Kearney [aut, cre] (<https://orcid.org/0000-0002-0730-4694>), #> Andrew Heiss [rev] (<https://orcid.org/0000-0002-3948-3914>), Francois #> Briatte [rev] #> Version: 0.7.0 #> URL: https://CRAN.R-project.org/package=rtweet #> #> -- File: C:/Users/diego/Documents/R/win-library/4.1/rtweet/Meta/package.rds #> -- Fields read: Title, Description, Author, Version, URL
In the next chunk I compare it with the corresponding fields from the cff
object:
as.cff(cff_rtweet[ c("title", "abstract", "authors", "version", "url") ]) #> title: 'rtweet: Collecting Twitter Data' #> abstract: 'An implementation of calls designed to collect and organize Twitter data #> via Twitter''s REST and stream Application Program Interfaces (API), which can be #> found at the following URL: <https://developer.twitter.com/en/docs>. This package #> has been peer-reviewed by rOpenSci (v. 0.6.9).' #> authors: #> - family-names: Kearney #> given-names: Michael W. #> email: kearneymw@missouri.edu #> orcid: https://orcid.org/0000-0002-0730-4694 #> version: 0.7.0 #> url: https://CRAN.R-project.org/package=rtweet
Valid keys
Here is a list of all the valid keys of the CFF schema. Most of them have an explicit mapping with the fields (or a combination of fields) in the DESCRIPTION and inst/CITATION files:
abstract | identifiers | repository |
authors | keywords | repository-artifact |
cff-version | license | repository-code |
commit | license-url | title |
contact | message | type |
date-released | preferred-citation | url |
doi | references | version |
The cffr package also includes an extensive vignette describing how these fields are computed with several examples.
Validating a cff
object
Once we have created an cff
object, we can check its validity using the
cff_validate()
function. This function can be used with cff
objects and with
CITATION.cff files. If there are any errors, output messages will help us debug
our object:
cff_validate(cff_rtweet) #> #> cff_validate results----- #> Congratulations! This cff object is valid # Creating a CITATION.cff file from an cff object and validating it cff_rtweet %>% # Write it to a tempfile cff_write(tempfile("CITATION", fileext = ".cff"), verbose = FALSE, validate = FALSE ) %>% cff_validate() #> #> cff_validate results----- #> Congratulations! This cff object is valid # Create a deliberated error and use the validator # Override the defaults with keys param wrong_keys <- list( url = "I am not an url", doi = "I am not a doi" ) cff_create("rtweet", keys = wrong_keys) %>% cff_validate() #> #> cff_validate results----- #> Oops! This cff object has the following errors: #> field message #> 1 data.doi referenced schema does not match #> 2 data.url referenced schema does not match
Validation of the initial cff
object is satisfactory, as seen in the messages.
But in the second example, where I forced some invalid values using the keys
parameter, we can see that the doi
and url
field are flagged as errors, as
the text strings do not correspond with the expected patterns for those fields
(e.g “http*” for urls and “10XXXX/XXXX” for DOIs).
Keeping your CITATION.cff file up-to-date
A CITATION.cff includes relevant information about the version, the release date and the DOI of your package, so you would want to keep this information up-to-date. cffr includes a GitHub Action that does the work for you.
It can be installed in your repo with the
cff_gha_update()
or copied to your .github/workflows
folder, and it would update your
CITATION.cff file on the following events:
-
When you publish a new release of the package on your GitHub repo.
-
Each time that you modify your DESCRIPTION or inst/CITATION files.
-
Additionally, the action can be run also manually.
This will ensure that the citation of your package is always accurate.
Conclusion
Over the last few months, support of CITATION.cff files has increasingly grown in the scientific citation ecosystem. The cffr package allows any R-package developer to easily integrate citation information with a wide variety of services via the creation of a CITATION.cff file leveraging the support introduced by GitHub.
Acknowledgments
I would like to thank Carl Boettiger, Maëlle Salmon and the rest of contributors of the codemetar package. This package was the primary inspiration for developing cffr and shares a common goal of increasing awareness of the efforts of software developers.
I would also like to thank João Martins and Scott Chamberlain for thorough reviews, which helped improve the package and the documentation, as well as Emily Riederer for handling the review process.
< section class="footnotes" role="doc-endnotes">-
Druskat, S., Spaaks, J. H., Chue Hong, N., Haines, R., Baker, J., Bliven, S., Willighagen, E., Pérez-Suárez, D., & Konovalov, A. (2021). Citation File Format (Version 1.2.0) [Computer software]. https://doi.org/10.5281/zenodo.5171937 ↩︎
-
Du, C., Cohoon, J., Priem, J., Piwowar, H., Meyer, C., & Howison, J. (2021, October 23). CiteAs: Better Software through Sociotechnical Change for Better Software Citation. Companion Publication of the 2021 Conference on Computer Supported Cooperative Work and Social Computing. ACM. http://doi.org/10.1145/3462204.3482889 ↩︎
-
Druskat, Stephan. (2021, September 27). Making software citation easi(er) – The Citation File Format and its integrations. Zenodo. https://doi.org/10.5281/zenodo.5529914 ↩︎
-
Matthew B. Jones, Carl Boettiger, Abby Cabunoc Mayes, Arfon Smith, Peter Slaughter, Kyle Niemeyer, Yolanda Gil, Martin Fenner, Krzysztof Nowak, Mark Hahnel, Luke Coy, Alice Allen, Mercè Crosas, Ashley Sands, Neil Chue Hong, Patricia Cruse, Daniel S. Katz, Carole Goble. 2017. CodeMeta: an exchange schema for software metadata. Version 2.0. KNB Data Repository. https://doi.org/10.5063/schema/codemeta-2.0 ↩︎
-
Carl Boettiger and Maëlle Salmon (2021). codemetar: Generate ‘CodeMeta’ Metadata for R Packages. https://github.com/ropensci/codemetar, https://docs.ropensci.org/codemetar/ ↩︎
-
Rich FitzJohn, Rob Ashton, Mathias Buus and Evgeny Poberezkin (2021). jsonvalidate: Validate ‘JSON’ Schema. R package version 1.3.2. https://CRAN.R-project.org/package=jsonvalidate ↩︎
-
Kearney, M. (2019, October 24). rtweet: Collecting and analyzing Twitter data. Journal of Open Source Software. The Open Journal. http://doi.org/10.21105/joss.01829 ↩︎
-
Gábor Csárdi, Kirill Müller and Jim Hester (2021). desc: Manipulate DESCRIPTION Files. R package version 1.4.0. https://CRAN.R-project.org/package=desc ↩︎
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.