Function to download biotic interaction datasets
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I work in ecology, biogeography, etc… Biotic interactions (interactions between species) and its repercussions on species distributions is my main research interest.
As such, I had, at some point, to download datasets on species interactions. I wanted to be able to produce a uniform (more or less, not as much as I would like) R object whatever the database I used. I created this function to do just that!
It can be used to download from database website or upload csv files into R (depending on the database). It currently uses the following databases:
- EcoBase – The database for the models using Ecopath with Ecosim. These are really good, have good metadata and spatial information (polygons) (Christensen & Walters, 2004; Heymans et al. 2016).
- Web of Life – To be really honest I don’t know this database all that much. However it also provides information on the dataset location, but it does not provide details on the ecosystem where it occurs (Fortuna et al. 2014).
- Global Web – This database provides details on the ecosystem where the interaction dataset occur, bu no details on spatial location are provided (Thompson et al. 2012).
- Mangal – This database provides lots of information on the biotic interactions datasets (Poisot et al. 2015).
In order to write this function I relied on work previously done by others. Namely the code to download EcoBase interaction datasets was obtained here. In what concerns the mangal database I relied on the work done in the package rmangal.
NOTE: This is important! Be sure to check the results of the function! I’m still working on it, so this is a provisional version. But I think it works pretty good though.
(Somewhere in the future it might be available in my GitHub.)
create.fw.list <- function(db, folder = NULL, type = NULL, ecosyst=FALSE, ref=FALSE, spatial=FALSE, code=FALSE) { #### Arguments #### #'db' - database - eb (EcoBase), gw (GlobalWeb), wl (Web of Life) and mg (Mangal) #'folder' - folder in the WD to get the dataset files (db=gw and wl). #'type' - if db=mg the user should provide the type of interactions to be downloaded # 'ecosyst' - Getting ecosystem information, only for gw, eb #'ref' references information #'spatial' - get spatial info, only for wl, eb and mg #### Data Sources #### #Global Web: https://www.globalwebdb.com/ #EcoTroph Example (EcoBase): http://sirs.agrocampus-ouest.fr/EcoTroph/index.php?action=examples #Script for EcoBase: http://sirs.agrocampus-ouest.fr/EcoBase/#discoverytools #Mangal: #https://mangal-wg.github.io/rmangal/articles/rmangal.html #Web of Life: http://www.web-of-life.es/ #### Results #### fwlist <- list() #### Conditions for data entry ############################################################ #db if(!db %in% c("eb","wl","gw","mg"))stop("Argument 'db' must take one of the following values:\n 'wl' - Web of Life 'mg' - mangal 'gw' - globalweb 'eb' - ecobase") #folder if(!db %in% c("wl","gw") & !is.null(folder)) stop("Argument 'folder'can only be used if 'db'= 'wl' or 'gw'!") #type #folder if(!db %in% c("mg") & !is.null(type)) stop("Argument 'type'can only be used if 'db'= 'mg'!") #ecosyst if(!db %in% c("gw", "eb") & ecosyst==TRUE) stop("Argument 'ecosyst'can only be used if 'db'= 'eb' or 'gw'!") #ref #all #spatial if(!db %in% c("wl", "mg", "eb") & spatial==TRUE) stop("Argument 'spatial'can only be used if 'db'= 'eb', 'mg' 'wl'!") #code if(!db %in% c("wl", "mg", "gw") & code==TRUE) stop("Argument 'code'can only be used if 'db'= 'wl', 'mg'', 'gw'!") ############################################################################################ #Updating each dataset database #GlobalWeb if (db == "gw"){ message("####################### GLOBALWEB DATABASE #######################\n\n") message("Fetching info from the provided folder!") files_gw <- list.files(path = folder, pattern = "WEB") ngw <- length(files_gw) message (paste0("There are ", ngw, " food web files in the folder!")) #Load files into list #And create vector of references if (ref==TRUE) reflist_gw <- c() names_gw <- c()#FW names #getting the files into R for(i in 1:ngw){ message(paste0("Fetching food web ", i, " in ", ngw, "!")) dfgw <- read.csv(paste0(folder,"/",files_gw[i]), header = FALSE) # read csv file dfgw <- dfgw[, colSums( is.na(dfgw) ) <=1]#Remove columns with all NA #Get the FW name names_gw[i] <- as.character(dfgw[2,1]) #Get the reference to the vector if (ref==TRUE) reflist_gw[i] <- as.character(dfgw[1,1]) #to name the columns names_gw_c <- c() n1 <- ncol(dfgw)-1 for(j in 1:n1){ names_gw_c[j] <- as.character(dfgw[2,j+1]) } #to name the rows names_gw_r <- c() n2 <- nrow(dfgw)-2 for(j in 1:n2){ names_gw_r[j] <- as.character(dfgw[j+2, 1]) } dfgw <- dfgw[-c(1,2),-1] #Remove columns with NA dfgw[dfgw==""] <- NA dfgw <- na.omit(dfgw) if(i==281){names_gw_r <- names_gw_r[-c(36,37)]}#the FW on i=281 has a note at the bottom #Delete the 'empty names' names_gw_c <- names_gw_c[names_gw_c!=""] names_gw_r <- names_gw_r[names_gw_r!=""] #Same names in rows or columns? #if(length(unique(names_gw_r)) < length(names_gw_r)) rown[i] <- as.character(i) #if(length(unique(names_gw_c)) < length(names_gw_c)) coln[i] <- as.character(i) #For some strange reason some rows and columns have the same name names_gw_c <- paste0("sp_", as.character(1:length(names_gw_c)), "_",names_gw_c) names_gw_r <- paste0("sp_", as.character(1:length(names_gw_r)), "_",names_gw_r) colnames(dfgw) <- names_gw_c rownames(dfgw) <- names_gw_r fwlist[[i]] <- dfgw } #Name the list names(fwlist) <- names_gw if(ref==TRUE){ references <- as.data.frame(matrix(ncol = 4)) names(references) <- c("FW code", "first_author", "year", "full_ref" ) files_gw <- list.files(folder, pattern = "WEB") message("Fetching references from the dataset files!") for(w in 1:ngw){ dfgw <- read.csv(paste0(folder,"/",files_gw[w]), header = FALSE) # read csv file #message(paste0("Reading file ", files_gw[w])) dfgw <- dfgw[, colSums( is.na(dfgw) ) <=1]#Remove columns with all NA #Get the reference to the vector full_ref1 <- as.character(dfgw[1,1]) references[w,4] <- full_ref1#full reference references[w,1] <- files_gw[w]#fw code references[w,2] <- str_sub(word(full_ref1, start = 1), 1, str_length(word(full_ref1, start = 1))-1)#fisrt author references[w,3] <- regmatches(x = full_ref1,gregexpr("[0-9]+",text = full_ref1))[[1]][1]#year #references[w,3] <- gsub('.+\\(([0-9]+)\\).+?$', '\\1', full_ref1)#year }#end loop to add refs }#end gw refs #ECOSYSTEM if(ecosyst==TRUE){ message("Searching for 'gw_list.csv' file...") if (!file.exists(paste0(folder, "/gw_list.csv"))) stop("\nThe pdf 'gw_list.pdf' has to be previously converted to a csv file...") #I had to conver the gw_list.pdf file to excel (csv), since I could not install tabulizes to extract pdf tables gw_eco <- read.csv(paste0(folder,"/","gw_list.csv"), header = TRUE, sep = ";") # read csv file filn <- paste0("WEB", as.character(gw_eco[,1]), ".csv") gw_eco2 <- gw_eco[,1:3] gw_eco2[,1] <- filn names(gw_eco2)[1] <- "FW" #yes... I do know the following few lines are 'ugly'... filn <- as.data.frame(cbind(filn, filn)) names(filn) <- c("filn1","filn2") #files_gw <- list.files(path = folder, pattern = "WEB") ecosystem <- merge(x=filn, y=gw_eco2, by.x= "filn2", by.y = "FW") ecosystem <- ecosystem[,c(2, 3, 4)] names(ecosystem)[1] <- "Food web" } }#end of gw #Web of Life if (db == "wl"){ message("####################### WEB OF LIFE DATABASE #######################\n\n") files_wl <- list.files(path = folder, pattern = "FW") nwl <- length(files_wl) message (paste0("There are ", nwl, " food web files in the folder!")) #Get refs and metrics table if (file.exists(paste0(folder, "/references.csv"))) { table_wl <- read.csv(paste0(folder, "/references.csv"), header = TRUE) # read csv file } else { stop("There is no 'references.csv' file on the folder, as provided by the website!") } #FW names names_wl <- as.character(table_wl[,8]) #Load files for(i in 1:nwl){ message(paste0("Fetching food web ", i, " in ", nwl, "!")) dfwl <- read.csv(paste0(folder, "/",files_wl[i]), header = TRUE) # read csv file #row.names(dfwl) <- as.character(dfwl[,1]) #dfwl <- dfwl[,-1] dfwl[is.na(dfwl)] <- 0 fwlist[[i]] <- dfwl } names(fwlist) <- names_wl #REFERENCES if(ref==TRUE){ references <- as.data.frame(matrix(ncol = 4)) names(references) <- c("FW code", "first_author", "year", "full_ref" ) message("Fetching references from the 'references.csv' file!") message("Checking the presence of the 'references.csv' file...") if(!file.exists(paste0(folder, "/references.csv"))==TRUE)stop("Can't retrieve reference details... \n File not present!") ref_file <- read.csv(paste0(folder, "/references.csv"), header = TRUE) # read csv file for(w in 1:nwl){ full_ref1 <- as.character(ref_file[w,7]) references[w,4] <- full_ref1#full reference references[w,1] <- as.character(ref_file[w,1])#fw code references[w,2] <- str_sub(word(full_ref1, start = 1), 1, str_length(word(full_ref1, start = 1))-1)#fisrt author references[w,3] <- regmatches(x = full_ref1,gregexpr("[0-9]+",text = full_ref1))[[1]][1]#year #references[w,3] <- gsub('.+\\(([0-9]+)\\).+?$', '\\1', full_ref1)#year }#end loop to add refs }#end wl refs #SPATIAL if(spatial==TRUE){ message("Fetching the spatial information from the 'references.csv' file!") message("Checking the presence of the 'references.csv' file...") if(!file.exists(paste0(folder, "/references.csv"))==TRUE)stop("Can't retrieve spatial info... \n File not present!") ref_file <- read.csv(paste0(folder, "/references.csv"), header = TRUE) # read csv file spatial1 <- ref_file[,c(1,9,10)] }#end of spatial }#end of wl #EcoBase if(db == "eb"){ message("####################### ECOBASE DATABASE #######################\n\n") message("Fetching info from the EcoBase website!") suppressWarnings({ #To obtain the list of available models suppressMessages({ h=basicTextGatherer() curlPerform(url = 'http://sirs.agrocampus-ouest.fr/EcoBase/php/webser/soap-client_3.php',writefunction=h$update) data1 <- xmlTreeParse(h$value(),useInternalNodes=TRUE) liste_mod <- ldply(xmlToList(data1),data.frame)#liste_mod contains a list and decription }) #Select only those allowing dissemination l2 <- subset(liste_mod, model.dissemination_allow =="true")#only those of which dissemination is allowed message("Sellected only those to which model dissemination is allowed!") #Select only those with whole food webs l3 <- subset(l2, model.whole_food_web =="true")#only those with the full food web message("Sellected only those to which the whole food web is available!") #Get model names model.name <- as.character(l3$model.model_name) input_list <- list() id <- as.numeric(as.character(l3$model.model_number)) #Loop to get input list for(i in 1:nrow(l3)){ message(paste0("Fetching information on food web ",i, " of ", nrow(l3))) suppressMessages({ h=basicTextGatherer() mymodel <- id[i] curlPerform(url = paste('http://sirs.agrocampus-ouest.fr/EcoBase/php/webser/soap-client.php?no_model=',mymodel,sep=''),writefunction=h$update,verbose=TRUE) data2 <- xmlTreeParse(h$value(),useInternalNodes=TRUE) input1 <- xpathSApply(data2,'//group',function(x) xmlToList(x)) }) #need do name the columns names_input <- as.character(input1[1,]) input1 <- as.data.frame(input1) colnames(input1) <- names_input input1 <- input1[-1,] input_list[[i]] <- input1 }#end of loop to get input list mnames <- names(input_list) for (i in 1:length(input_list)){ m2 <- input_list[[i]] #get the model nnodes <- length(m2) node_names <- names(m2) # if (biomass == TRUE) # { # nodes_biomass <- as.data.frame(matrix(ncol=3, nrow=nnodes)) # names(nodes_biomass) <- c("id", "name", "biomass") # } int_matrix <- as.data.frame(matrix(ncol=nnodes, nrow=nnodes)) for(j in 1:length(m2)){ node1 <- m2[[j]] node_id <- as.numeric(node1$group_seq) #node1_biomass <- as.numeric(node1$biomass) node_name <- node_names[j] #biomass #if (biomass == TRUE) #{ #nodes_biomass[node_id, 1] <- node_id #nodes_biomass[node_id, 2] <- node_name #nodes_biomass[node_id, 3] <- node1_biomass #} #matrix colnames(int_matrix)[node_id] <- node_name rownames(int_matrix)[node_id] <- node_name diet_node1 <- node1$diet_descr nr_food_items <- length(diet_node1) for(a in 1:nr_food_items){ item1 <- diet_node1[[a]] id_item1 <- as.numeric(item1$prey_seq) proportion_item1 <- as.numeric(item1$proportion) detritus_item1 <- as.numeric(item1$detritus_fate) #send to matrix int_matrix[id_item1,node_id] <- proportion_item1 } } int_matrix[is.na(int_matrix)] <- 0#replacing NA with 0 #if(db=="eb" && biomass == TRUE) fwlist[[i]] <- list(biomass=nodes_biomass, trophic_relations=int_matrix) #if(db=="eb" && biomass == FALSE) fwlist[[i]] <- int_matrix fwlist[[i]] <- int_matrix } names(fwlist) <- model.name })#end of outer suppressWarnings #REFERENCES if(ref==TRUE){ references <- as.data.frame(matrix(ncol = 4)) names(references) <- c("FW code", "first_author", "year", "full_ref" ) message("Fetching the references information!") for(w in 1:nrow(l3)){ #Get the reference to the vector full_ref1 <- as.character(l3$model.reference)[w] references[w,4] <- full_ref1#full reference references[w,1] <- as.numeric(as.character(l3$model.model_number[w]))#fw code references[w,2] <- as.character(l3$model.author[w])#fisrt author references[w,3] <- regmatches(x = full_ref1,gregexpr("[0-9]+",text = full_ref1))[[1]][1]#year #references[w,3] <- gsub('.+\\(([0-9]+)\\).+?$', '\\1', full_ref1)#year }#end loop to add refs }#end of eb refs #ECOSYSTEM if(ecosyst==TRUE){ ecosystem <- data.frame(l3$model.model_number, l3$model.country, l3$model.ecosystem_type) names(ecosystem) <- c("Food web", "Location", "Ecosystem") }#end of eb ecosystem #SPATIAL if(spatial==TRUE){ message("Fetching spatial information from the EcoBase website...") #Get actual polygons EcoBase_shape <- sf::st_read("http://sirs.agrocampus-ouest.fr/EcoBase/php/protect/extract_kml.php") ebd <- EcoBase_shape$Name #Getting the model numbers nmr <- list() for(i in 1:length(ebd)){ nr <- strsplit(as.character(ebd[i]), "--::")[[1]][1] nr <- as.numeric(str_extract_all(nr, "\\d+")[[1]])#Alternative to Numextract nmr[[i]] <- nr } nmr2 <- c()#line rows for each model for(i in 1:length(nmr)){ a <- nmr[[i]] b <- length(a) c1 <- rep(i,b) nmr2 <- c(nmr2, c1) } #In Which row in ecobase geo file is the model? nmr <- unlist(nmr) table1 <- as.data.frame(cbind(nmr2, nmr)) colnames(table1) <- c("row_n","id") #In which row does model.model_number with a given Id occurs? lines_n <- c() for (i in 1:nrow(liste_mod)){ id <- as.numeric(as.character(liste_mod$model.model_number[i])) lines_n[i] <- as.numeric(table1[table1$id==id,][1]) } ecobase_poly2 <- list() for(i in 1:length(lines_n)){ ecobase_poly2[i] <- st_geometry(EcoBase_shape)[lines_n[i]] #plot(st_geometry(EcoBase_shape)[lines_n[i]], border="green", add=TRUE) } #if no polygon then bounding box #into here ecobase_poly2 for(i in 1:length(ecobase_poly2)){ if(is.na(lines_n[i])){ #create a bounding box geographic thing z1 <- as.numeric(Numextract(liste_mod$model.geographic_extent[[i]])) z2 <- c(z1[4], z1[1], z1[2], z1[1], z1[2], z1[3], z1[4], z1[3]) x1 <- as.data.frame(matrix(z2, ncol=2, byrow=TRUE)) x1 <- cbind(x1[2], x1[1])#had to change lat and long... I had this the other way around... p1 <- Polygon(x1) ps1 <- Polygons(list(p1),1) ecobase_poly2[[i]] <- st_as_sf(SpatialPolygons(list(ps1))) } ecobase_poly2[[i]] <- ecobase_poly2[[i]] } #convert all to class sf for(i in 1:length(ecobase_poly2)){ if(!any(class(ecobase_poly2[[i]])=='sf')){ t2 <- ecobase_poly2[[i]] t3 <- st_cast(t2, to="POLYGON") ecobase_poly2[[i]] <- st_as_sf(as(st_zm(st_geometry(t3)), "Spatial")) } else message("Ok!") } #line.Id correspondence table2 <- as.data.frame(cbind(1:length(ecobase_poly2),as.numeric(as.character(liste_mod$model.model_number)))) names(table2) <- c("row","id") #select the corresponding polygons id_selected <- as.numeric(as.character(l3$model.model_number)) #Which rows? rows_selected <- c() for(i in 1:length(id_selected)){ rows_selected[i] <- as.numeric(table2[table2["id"]==id_selected[i],][1]) } spatial1 <- ecobase_poly2[rows_selected] }#end of eb spatial }#end of eb #MANGAL if(db == "mg"){ message("####################### MANGAL DATABASE #######################\n\n") message("Fetching datasets from the Mangal website! \n\n Types 'predation' and 'herbivory' by default... \n but run mangal function 'avail_type' to check available types...\n\nThis operation might take a long time!") ntypes <- length(type) net_info <- list() for(i in 1:ntypes){ message(paste0("\n\nFetching information from interactions of the type ","'",type[i], "'!")) fwlist1 <- search_interactions(type = type[i]) %>% get_collection() net_info <- rbind(net_info, fwlist1) fwlist2 <- as.igraph(fwlist1) fwlist <- c(fwlist, fwlist2) #class(fwlist) } #Converting igraph objects to data frame for(i in 1:length(fwlist)){ fw2 <- fwlist[[i]] #convert each igraph to a data frame fw3 <- as_data_frame(fw2, what = "both") id_name <- fw3$vertices[,1:2] for(j in 1:nrow(id_name)){#clean the names node_name <- id_name$original_name[j] if (grepl(":", node_name, fixed=TRUE)) { node_name <- tail(strsplit(node_name, ": "))[[1]] id_name[j,2] <- node_name[2] } else id_name[j,2] <- node_name }#end clean names id_edges <- fw3$edges[,1:3] int_matrix <- as.data.frame(matrix(ncol = nrow(id_name), nrow = nrow(id_name))) colnames(int_matrix) <- id_name$original_name rownames(int_matrix) <- id_name$original_name #Fill the matrix for(a in 1:nrow(id_edges)){ edge1 <- as.numeric(id_edges[a,1:2]) name1 <- id_name[as.character(edge1[1]),][,2] name2 <- id_name[as.character(edge1[2]),][,2] int_matrix[name1,name2] <- 1 } int_matrix[is.na(int_matrix)] <- 0 #convert all NA to zero fwlist[[i]] <- int_matrix }#end of loop to convert to a data frame if(ref==TRUE){ references <- as.data.frame(matrix(ncol = 4)) names(references) <- c("Dataset ID", "first_author", "year", "DOI" ) message("Fetching references!") for(j in 1:length(net_info)){ dataset_id <- net_info[[j]]$dataset$dataset_id first_author <- net_info[[j]]$reference$first_author year_mng <- as.numeric(net_info[[j]]$reference$year) doi_mng <- net_info[[j]]$reference$doi references[j,1] <- dataset_id references[j,2] <- first_author references[j,3] <- year_mng references[j,4] <- doi_mng references <- references[order(references$`Dataset ID`),] rownames(references) <- 1:nrow(references) } }#End of mg refs if(spatial==TRUE){ spatial1 <- as.data.frame(matrix(ncol = 4)) names(spatial1) <- c("Dataset ID", "first_author", "lat", "long") message("Fetching coordinates!") for(z in 1: length(net_info)){ dataset_id <- net_info[[z]]$dataset$dataset_id lat_mng <- net_info[[z]]$network$geom_lat long_mng <- net_info[[z]]$network$geom_lon first_author <- net_info[[z]]$reference$first_author if(length(unlist(lat_mng))>1){ spatial2 <- as.data.frame(matrix(ncol = 4)) names(spatial2) <- c("Dataset ID", "first_author", "long", "lat" ) for(b in 1:length(unlist(lat_mng))){ spatial2[b,3] <- long_mng[[1]] [b] spatial2[b,4] <- lat_mng [[1]] [b] } spatial2[,1] <- dataset_id spatial2[,2] <- first_author spatial1 <- rbind(spatial1, spatial2) } spatial1[z,1] <- dataset_id spatial1[z,2] <- first_author if(length(unlist(lat_mng))==1) spatial1[z,3] <- lat_mng if(length(unlist(lat_mng))==1) spatial1[z,4] <- long_mng } spatial1 <- spatial1[order(spatial1$`Dataset ID`),] rownames(spatial1) <- 1:nrow(spatial1) }#End of mg spatial if (exists("references") & exists("spatial1")) (if(nrow(references)!=nrow(spatial1)) message("WARNING: There are more than on FW in some datasets! References and Spatial data frames have different number of rows.")) }#end of mangal message(paste0("DONE! \n\nOverall the list stores ", length(fwlist), " datasets!")) master_list <- list() master_list[["int_matrix"]] <- fwlist if(ecosyst==TRUE) { master_list[["ecosystem"]] <- ecosystem message ("\n Additional element in the results: \n\n The vector with information on the ecosystems.") } if(ref==TRUE) { master_list[["references"]] <- references message ("Additional element in the results! \nA data frame with information on the references.") } if(spatial==TRUE) { master_list[["spatial_info"]] <- spatial1 message ("\n Additional element in the results: \n\n Spatial information was added.") } if(code==TRUE) { if(db == "gw") master_list[["code"]] <- files_gw if(db == "wl") master_list[["code"]] <- files_wl if(db == "mg") master_list[["code"]] <- references[1,] message ("Added food web code information.") } #Return results if(length(master_list)==1) return(fwlist) if(length(master_list)!=1) return(master_list) message("####################### DONE! #######################") }#END OF FUNCTION create.fw.list
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.