Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Google Summer of Code 2013 is half way through. Mid term evaluations are underway. I thought this is a good logical point for us to share what we have been doing for Biodiversity Data Visualizations in R project and open up the package for testing and some early feedback. We have named the package bdvis. The package is on github, and I would appreciate if you could install and test it. Feedback may be given in the comments here, using issues on github by twitter or email.
Getting data
The data was obtained from the Data portal of Global Biodiversity Information Facility. (http://data.gbif.org). The data set we are looking for is iNaturalist research grade records. We accessed the datasets page at http://data.gbif.org/datasets/ and selected the iNaturalist.org page from the alphabetic list which is at http://data.gbif.org/datasets/provider/407. Once on this page use link Explore: Occurrences and then from the next page click Download: Spreadsheet of results. On this page make sure Comma separated values is selected and then press Download Now button. Website may take a few minutes to make your download ready. Once it is ready, the download link will be provided. Typically the name of the file will be occurrence-search-12345.zip The number of digits would be as many as 40. Use the link to download the .zip file and then extract the data file occurrence-search-12345.csv in the working directory of R. Since this file has a long name, let us rename it to inat.csv for convenience.
Now we are ready to load our data.
inat = read.csv("inat.csv") dim(inat)
If it shows something like
[1] 66581 47
we are on right track. Our data is loaded into R. For the time being, this package handles only GBIF provided data format, but getting user generated biodiversity data in this format using some built in functions is being worked out.
Package installation
Now let us install bdvis package. First we need to get devtools package which will let us install packages from github (rather than CRAN).
install.packages("devtools") require(devtools) install_github("bdvis", "vijaybarve") require(bdvis)
if this produces something like
Loading required package: bdvis Attaching package: ‘bdvis’ The following object(s) are masked from ‘package:base’: summary
we are on right track. Our packages is installed and loaded into R.
Package functions
1. summery
Let us start playing with the functions now. We have the data loaded in inat data frame.
bdvis::summary(inat)
Should produce something like:
Total no of records = 66581 Date range of the records from 1710-02-26 to 2012-12-31 Bounding box of records -77.89309 , -177.37895 - 78.53431 , 179.2615 Taxonomic summary... No of Families : 1394 No of Genus : 5089 No of Species : 11299
What does this tell us about our data ?
- We have 66581 records in the data set
- The date range is from 1710 to 2012. (Really we have record form 1710? Looks we have a problem there.)
- The bounding box is almost the whole world. Yes, this is global data set.
- We have so many Families, Genus and Species represented in this data set.
I have two questions here:
- What more would you like to get in the summary?
- Should I rename the function summary to something else, so it does not clash with usual data frame summery function name?
2. mapgrid
Now let us generate a Heat map of the records in this data set. This map will show us the density of records in different parts of the world. To generate this map
mapgrid(inat,ptype="species")
ptype could be records if we need the map with raw records rather than aggregated to species. Again the questions:
- What more options would you like to see here?
- Ability to zoom in certain region?
- Control over color pallet ?
3. tempolar
Now coming to Temporal visualizations, the function tempolar would make polar plots of temporal data into daily, weekly and monthly plots. The code and samples are as follows:
tempolar(inat,color="green",title="iNaturalist daily" ,plottype="r",timescale="d") tempolar(inat,color="blue",title="iNaturalist weekly" ,plottype="p",timescale="w") tempolar(inat,color="red",title="iNaturalist monthly" ,plottype="r",timescale="m")
Here options to control color, title, plottype and of course timescale are provided.
We are less than half way through our original proposal, and will continue to actively build this package. As I build more functionality, I will post more information on the blog. Till that time keep the feedback flowing telling us what more you would like to see in this package.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.