Site icon R-bloggers

taxize v0.3.0 update – a new data source, taxonomy in writing, and uBio examples

[This article was first published on rOpenSci Blog - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We just released v0.3 of taxize. For details on the update, see the release notes.

Some new features

Updating to v0.3

Since taxize is not updated to v0.3 on CRAN yet at the time of writing this, install taxize from GitHub:

devtools::install_github("ropensci/taxize")

Then load taxize

library("taxize")

International Plant Names Index (IPNI)

We added the IPNI as a new data source in taxize in v0.3. Currently, there is only one function to interact with IPNI: ipni_search(). What follows are a few examples of how you can use ipni_search().

Search for the genus Brintonia

ipni_search(genus='Brintonia')[,c(1:3)]

##         id version     family
## 1   7996-1     1.3 Asteraceae
## 2 296073-2     1.3 Asteraceae
## 3  36551-2     1.3 Asteraceae
## 4 186337-1     1.3 Asteraceae

Search for the species Pinus contorta

head(ipni_search(genus='Pinus', species='contorta')[,c(1:3)])

##           id     version   family
## 1   262873-1 1.1.2.1.1.2 Pinaceae
## 2   262872-1 1.2.2.1.1.1 Pinaceae
## 3 30000492-2     1.1.2.1 Pinaceae
## 4   196950-2         1.4 Pinaceae
## 5   921291-1         1.4 Pinaceae
## 6   196949-2         1.5 Pinaceae

Different output formats (the default is minimal)

head(ipni_search(genus='Ceanothus')[,c(1:3)])

##           id     version     family
## 1    55268-3         1.1 Rhamnaceae
## 2 30006383-2 1.1.2.1.1.3 Rhamnaceae
## 3    55269-3         1.1 Rhamnaceae
## 4    33421-1         1.5 Rhamnaceae
## 5 60461578-2         1.1 Rhamnaceae
## 6   331948-2         1.4 Rhamnaceae

head(ipni_search(genus='Ceanothus', output='extended'))[,c(1:3)]

##           id     version     family
## 1    55269-3         1.1 Rhamnaceae
## 2    33421-1         1.5 Rhamnaceae
## 3    55268-3         1.1 Rhamnaceae
## 4 30006383-2 1.1.2.1.1.3 Rhamnaceae
## 5 60461578-2         1.1 Rhamnaceae
## 6   331948-2         1.4 Rhamnaceae

If you do something wrong, you get a message, and the actual output is NA

ipni_search(genus='Brintoniaasasf')

## Warning: No data found

## [1] NA

uBio examples

Until now, we have had functions to interact with uBio's API, but it probably hasn't been too clear how to use them, and they were a little buggy for sure. We have squashed many bugs in ubio functions. Here is an example workflow of how to use ubio functions.

ubio_search

Search uBio by taxonomic name. This is sort of the entry point for uBio where you can search by taxonomic name, from which you can get namebankID's that can be passed to the ubio_classification_search and ubio_namebankID functions

lapply(ubio_search(searchName = 'elephant'), head)

## $scientific
##   namebankid       namestring   fullnamestring packageid packagename
## 1    6938660 Cerylon elephant Cerylon elephant        80 Cerylonidae
##   basionymunit rankid rankname
## 1      6938660     24  species
## 
## $vernacular
##   namebankid    namestring languagecode    languagename packageid
## 1    8118711 Elephant fish          115 Creole, English         3
## 2    8118714 Elephant fish          115 Creole, English         3
## 3    8118726 Elephant fish          115 Creole, English         3
## 4    8115700 Elephant fish          115 Creole, English         3
## 5    8115663 Elephant fish          115 Creole, English         3
## 6    8114377 Elephant fish          115 Creole, English      2463
##   packagename namebankidlink   namestringlink
## 1      Pisces         132263 Mormyrus tapirus
## 2      Pisces         132258 Mormyrus tapirus
## 3      Pisces         181174 Mormyrus tapirus
## 4      Pisces         128971 Mormyrus tapirus
## 5      Pisces         128972 Mormyrus tapirus
## 6  Mormyridae        2299821 Mormyrus tapirus
##                  fullnamestringlink
## 1 Mormyrus tapirus Pappenheim, 1905
## 2 Mormyrus tapirus Pappenheim, 1905
## 3 Mormyrus tapirus Pappenheim, 1905
## 4 Mormyrus tapirus Pappenheim, 1905
## 5 Mormyrus tapirus Pappenheim, 1905
## 6 Mormyrus tapirus Pappenheim, 1905

id <- ubio_search(searchName = 'elephant')$scientific$namebankid[1]

## Error: CHAR() can only be applied to a 'CHARSXP', not a 'NULL'

ubio_id

Get data on a specific uBio namebankID. Use the id from the previous code block

ubio_id(namebankID = id)

## $data
##   namebankid       namestring   fullnamestring packageid packagename
## 1    6938660 Cerylon elephant Cerylon elephant        80 Cerylonidae
##   basionymunit rankid rankname
## 1      6938660     24  Species
## 
## $synonyms
## NULL
## 
## $vernaculars
## NULL
## 
## $cites
## NULL
## 
## $mappings
## NULL

ubio_classification_search

Return hierarchiesID that refer to the given namebankID

ubio_classification_search(namebankID = 3070378)

##   hierarchiesid classificationtitleid              classificationtitle
## 1       2477072                    84                    NCBI Taxonomy
## 2      11166818                   100                    NCBI Taxonomy
## 3      17950600                   104 uBiota 2008-03-20T10:36:50-04:00

ubio_classification

Return all ClassificationBank data pertaining to a particular hierarchiesID

ubio_classification(hierarchiesID = 2483153)

## Error: XML content does not seem to be XML: ''

ubio_synonyms

Search for taxonomic synonyms by hierarchiesID

ubio_synonyms(hierarchiesID = 4091702)

## Error: invalid subscript type 'list'

Examples of using taxize in writing

Let's say one is writing a paragraph in which you are using taxonomic or common names, and perhaps you want to have the number of taxa in a particular group. You can write a paragaph like:

I studied the common weed species _Tragopogon dubius_ (`r sci2comm('Tragopogon dubius', db='itis')[[1]][1]`; `r tax_name(query = "Tragopogon dubius", get = "family", db = "ncbi")[[1]]`) and _Cirsium arvense_ (`r sci2comm('Cirsium arvense', db='itis')[[1]][1]`; `r tax_name(query = "Cirsium arvense", get = "family", db = "ncbi")[[1]]`).

Which renders to:

I studied the common weed species Tragopogon dubius (yellow salsify; Asteraceae) and Cirsium arvense (Canada thistle; Asteraceae).

Notice how inside backticks you can execute code by starting with an r, then doing something like searching for common names for a taxon.

Another example:

We found that `r sci2comm('Tragopogon dubius', db='itis')[[1]][1]` was very invasive.

Renders to:

We found that yellow salsify was very invasive.

Another example:

There are `r nrow(downstream('Tragopogon', db = "col", downto = "Species")$Tragopogon)` species (source: Catalogue of Life) in the _Tragopogon_ genus, meaning there is much more to study :)

Renders to:

There are 142 species (source: Catalogue of Life) in the Tragopogon genus, meaning there is much more to study 🙂

el fin

Please do update to v0.3, try it out, report bugs, and get back to us with any questions!

To leave a comment for the author, please follow the link and comment on their blog: rOpenSci Blog - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.