Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Python has some pretty awesome data-manipulation and graphing capabilities. If you’re a heavy R-user who dabbles in Python like me, you might wonder what the equivalent commands are in Python for dataframe manipulation. Additionally, I was curious to see how many lines of code it took me to do that same task (load, clean, and graph data) in both R and Python. (I’d like to stop the arguments about efficiency and which language is better than which here, because neither my R nor Python code are the super-efficient, optimal programming methods. They are, however, how I do things. So to me, that’s what matters. Also, I’m not trying to advocate one language over the other (programmers can be a sensitive bunch), I just wanted to post an example showing how to do equivalent tasks in each language).
First, R
# read Data JapBeet_NoChoice <- read.csv("~/Documents/FIU/Research/JapBeetle_Temp_Herbivory/Data/No_Choice_Assays/JapBeet_NoChoice.csv") # drop incomplete data feeding <- subset(JapBeet_NoChoice, Consumption!='NA') # refactor and clean feeding$Food_Type <- factor(feeding$Food_Type) feeding$Temperature[which(feeding$Temperature==33)] <- 35 # subset plants <- c('Platanus occidentalis', 'Rubus allegheniensis', 'Acer rubrum', 'Viburnum prunifolium', 'Vitis vulpina') subDat <- feeding[feeding$Food_Type %in% plants, ] # make a standard error function for plotting seFunc <- function(x){ se <- sd(x) / sqrt(sum(!is.na(x))) lims <- c(mean(x) + se, mean(x) - se) names(lims) <- c('ymin', 'ymax') return(lims) } # ggplot! ggplot(subDat, aes(Temperature, Herb_RGR, fill = Food_Type)) + stat_summary(geom = 'errorbar', fun.data = 'seFunc', width = 0, aes(color = Food_Type), show_guide = F) + stat_summary(geom = 'point', fun.y = 'mean', size = 3, shape = 21) + ylab('Mass Change (g)') + xlab(expression('Temperature '*degree*C)) + scale_fill_discrete(name = 'Plant Species') + theme( axis.text = element_text(color = 'black', size = 12), axis.title = element_text(size = 14), axis.ticks = element_line(color = 'black'), legend.key = element_blank(), legend.title = element_text(size = 12), panel.background = element_rect(color = 'black', fill = NA) )
Next, Python!
# read data JapBeet_NoChoice = pd.read_csv("/Users/Nate/Documents/FIU/Research/JapBeetle_Temp_Herbivory/Data/No_Choice_Assays/JapBeet_NoChoice.csv") # clean up feeding = JapBeet_NoChoice.dropna(subset = ['Consumption']) feeding['Temperature'].replace(33, 35, inplace = True) # subset out the correct plants keep = ['Platanus occidentalis', 'Rubus allegheniensis', 'Acer rubrum', 'Viburnum prunifolium', 'Vitis vulpina'] feeding2 = feeding[feeding['Food_Type'].isin(keep)] # calculate means and SEs group = feeding2.groupby(['Food_Type', 'Temperature'], as_index = False) sum_stats = group['Herb_RGR'].agg({'mean' : np.mean, 'SE' : lambda x: x.std() / np.sqrt(x.count())}) # PLOT for i in range(5): py.errorbar(sum_stats[sum_stats['Food_Type'] == keep[i]]['Temperature'], sum_stats[sum_stats['Food_Type'] == keep[i]]['mean'], yerr = sum_stats[sum_stats['Food_Type'] == keep[i]]['SE'], fmt = 'o', ms = 10, capsize = 0, mew = 1, alpha = 0.75, label = keep[i]) py.xlabel(u'Temperature (\u00B0C)') py.ylabel('Mass Change') py.xlim([18, 37]) py.xticks([20, 25, 30, 35]) py.legend(loc = 'upper left', prop = {'size':10}, fancybox = True, markerscale = 0.7) py.show()
So, roughly the same number of lines (excluding importing of modules and libraries) although a bit more efficient in Python (barely). For what it’s worth, I showed these two graphs to a friend and asked him which he liked more, he chose Python immediately. Personally, I like them both. It’s hard for me to pick one over the other. I think they’re both great. The curious can see much my older, waaayyy less efficient, much more hideous version of this graph in my paper, but I warn you.. it isn’t pretty. And the code was a nightmare (it was pre-ggplot2 for me, so it was made with R’s base plotting commands which are a beast for this kind of graph).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.