rcites – The story behind the package
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The Ecology Hackathon
Almost one year ago now, ecologists filled a room for the “Ecology Hackathon: Developing R Packages for Accessing, Synthesizing and Analyzing Ecological Data” that was co-organised by rOpenSci Fellow, Nick Golding and Methods in Ecology and Evolution. This hackathon was part of the “Ecology Across Borders” Joint Annual Meeting 2017 of BES, GfÖ, NecoV, and EEF in Ghent. At different tables, different people joined each other to work on different ideas to implement as R packages. At our table, we were around ten people that more or less did not know anything about what we aimed for. We barely knew each other and nobody had clear expectations, just the desire of learning more about R packages. We were interested in a common idea posted as a wishlist in the rOpenSci community: building an R package to interact with CITES and its Speciesplus database. CITES (the Convention on International Trade in Endangered Species of Wild Fauna and Flora) is an international agreement between governments and provides key information to ensure that international trade in specimens of wild animals and plants does not threaten their survival. At 10 am, nobody had a clear idea on where to start. By 6 pm, we had a functional prototype of the rcites
package, which was really rewarding and gave motivation to follow up on the package development. We did great team-work, met new researchers, and learned a bunch of new stuff. This was definitely a successful hackathon!
Now, almost one year after, it appears to us that there were several key issues that the hackathon contributed to our success. The first was the short but thorough introduction to the development of R packages by Maëlle Salmon (Maëlle was an Editor for rOpenSci software peer review at the time, and became a staff Research Software Engineer with rOpenSci shortly after). Within an hour or so we had a clear vision of what the development of an R package pipeline looks like. A second key was the help of more experienced people. Our group immensely benefited from the experience of Tamora James who kindly answered our questions. The third key was the diversity of people interested in the project. Indeed, while some of us were beginner R users, others had moderate experience in creating packages and others had already dealt with CITES before, so we were able to split tasks organically. Some of us learned how to access the CITES/Speciesplus API and how to write a request via URLs. Others worked on rough versions of a functional code, while others started writing the help files and tested the package. After a few hours of frustration, ignoring why a given function refused to work as intended, things started to fit like pieces of a jigsaw puzzle. This was very empowering.
To summarize the hackathon: We learned a lot. First, how to create an R package, i.e. creating a structure of files, writing functions, and their documentation, testing functions, checking whether the package is working (it was already a lot at that stage of development). Second, how to write URL requests in R to query a web API. Third, how to use GitHub for collaborative coding. Of course, we did not become experts on all of these topics, it was just the beginning of an adventure and a lot of work remained, but still … what a day!
The long way to perfection
Although most of the code was written during the hackathon in Ghent, the package was deeply restructured in the following months. Indeed we realized that a good R package is more than a set of functions and requires a proper organization of the code to avoid redundancy, adequate unit testing, clear and complete documentation, a cool hexagonal sticker and a lot of thought on how to make the package user-friendly. All this requires time, and we moved slowly while completing the package. After seven months, we were ready to submit to rOpenSci and submit a first version (v0.1.0) to CRAN.
The help of rOpenSci and its onboarding program was pivotal. Two dedicated reviewers, Noam Ross and Margaret Siple, tested the package and suggested enhancements. Their reviews were of top-notch quality, they gave extremely valuable advice and were really supportive of our work and the development of the package. After the review, we were very enthusiastic and decided to do our best to address all of their comments. Interestingly enough, during that stage, the size of the code of our key functions shrunk while the number of “private” functions (gathered in zzz.R) increased. We actually added several functionalities while better structuring the code to offer a better user experience (well, we hope so). Now, we can tell that the difference between a working package and a well-crafted package is important. We also understand why rOpenSci suggests doing the CRAN submission after the review. There is no rush for a CRAN release, as the package is already available on GitHub, plus the number of changes after a revision is likely to generate a major release. Hence, scripts built based on earlier versions of the package will no longer work with the revised package. This is exactly what happened for us, we released v1.0.0 after the review process, and we’ve learned our lesson.
So how does rcites
work?
For a quick illustration, we can retrieve CITES information on the African bush elephant ( Loxodonta africana, hereafter “elephant”) native distribution. The elephant is a highly endangered species that is not only illegally traded globally but also a flagship species of nature conservation and the logo species of CITES.
We start with a basic setup: we load the package and set the token (see details in the package vignettes):
library(rcites) set_token("8QW6Qgh57sBG2k0gtt")
In order to access information about the elephant, we first need to retrieve its Speciesplus taxon identifier. For this, we use the spp_taxonconcept()
function and the elephant’s scientific name, Loxodonta africana, as query_taxon
argument.
elephant_taxonconcept <- spp_taxonconcept(query_taxon = "Loxodonta africana") elephant_taxonconcept
Now we can map the elephant’s distribution with the help of the rworldmap
package.
library(rworldmap) par(las = 1) elephant_distr <- spp_distributions(taxon_id = "4521", verbose = FALSE)$distributions map2 <- joinCountryData2Map(elephant_distr, joinCode="ISO2", nameJoinColumn = "iso_code2", nameCountryColumn = "name") plot(c(-23, 62), c(45, -40), type = "n", main = "Loxodonta africana", xlab = "Longitude", ylab = "Latitude") plot(map2, add = TRUE) plot(map2[!is.na(map2$iso_code2),], col = "#cba74d", add = TRUE)
In this map we can see all countries where the elephant occurrs naturally.
For more functionalities, please check the package documentation and the vignette Study case: the African bush elephant (Loxodonta africana). For citation, please also refer to the release paper (doi: 10.21105/joss.01091).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.