Taxonomy with R: Exploring the Taxize-Package
[This article was first published on theBioBucket*, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
First off, I’d really like to give a shout-out to the brave people who have created and maintain this great package – the fame is yours!Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
So, while exploring the capabilities of the package some issues with the ITIS-Server arose and with large datasets things weren’t working out quite well for me.
I then switched to the NCBI API and saw that the result is much better here (way quicker, on first glance also a higher coverage).
At the time there is no taxize-function that will pull taxonomic details from a classification returned by NCBI, that’s why I plugged together a little wrapper – see here:
# some species data: spec <- data.frame("Species" = I(c("Bryum schleicheri", "Bryum capillare", "Bryum argentum", "Escherichia coli", "Glis glis"))) spl <- strsplit(spec$Species, " ") spec$Genus <- as.character(sapply(spl, "[[", 1)) # for pulling taxonomic details we'd best submit higher rank taxons # in this case Genera. Then we'll submit Genus Bryum only once and # save some computation time (might be an issue if you deal # with large datasets..) gen_uniq <- unique(spec$Genus) # function for pulling classification details ("phylum" in this case) get_sys_level <- function(x){ require(taxize) a <- classification(get_uid(x)) y <- data.frame(a[[1]]) # if there are multiple results, take the first.. z <- tryCatch(as.character(y[which(y[,2] == "phylum"), 1]), # in case of any other errors put NA error = function(e) NA) z <- ifelse(length(z) != 0, z, NA) # if the taxonomic detail is not covered return NA return(data.frame(Taxon = x, Syslevel = z)) } # call function and rbind the returned values result <- do.call(rbind, lapply(gen_uniq, get_sys_level)) print(result) # Taxon Syslevel # 1 Bryum Streptophyta # 2 Escherichia Proteobacteria # 3 Glis Chordata # now merge back to the original data frame spec_new <- merge(spec, result, by.x = "Genus", by.y = "Taxon") print(spec_new) # Genus Species Syslevel # 1 Bryum Bryum schleicheri Streptophyta # 2 Bryum Bryum capillare Streptophyta # 3 Bryum Bryum argentum Streptophyta # 4 Escherichia Escherichia coli Proteobacteria # 5 Glis Glis glis Chordata #
To leave a comment for the author, please follow the link and comment on their blog: theBioBucket*.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.