Python Complements R’s Shortcomings
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’m a big fan of open-source software for research. For example, R-statistics, Qgis, and Grass GIS are awesome programs. R can do any statistical tests and numerical modeling you can imagine; if there’s not a built-in function you can write one (the beauty of using a programming language over point-and-click statistical programs). Granted, SPSS and Minitab do allow you to write scripts, but the language is less straightforward than R, and matrix-manipulation is easier in R. Same with Grass GIS. Qgis is an amazing substitute for ArcMap, with a beautiful interface. The other benefit of using a programming language for your research is reproducibility. You can save your code, re-run it to generate the same exact tests and figures so that you never forget what you did. Plus, you can share your code as appendices on your paper so that others can read your code to reproduce your results exactly. There is absolute transparency. This can be incredibly important, as noted in a post by Jeremy Fox over at Dynamic Ecology.
However, as wonderful as R is, there are things R does not do well. Symbolic math is one of them (R CAN do it, but I find the commands clunky). However, the Python programming language (which is a high-level language that can be used to make stand-alone software) has excellent modules (like R packages) for matrix manipulations and symbolic math. Although I’m still new to it, I dare say that Python’s mathematical package is near as complete as MATLAB (at least for my uses). In fact, Python’s numerical packages are widely used in physics, astronomy, and bioinformatics.
Python makes use of three main modules for math: scipy, numpy, and sympy. Scipy and Numpy together make Python a powerful MATLAB-esque program, while Sympy adds capabilities for symbolic math. Matplotlib (and by association Pylab) provide excellent graphing capabilities. Another benefit is that the language is similar to (but not identical to) R, so that transitioning should be easy. R does have a python-language interface package, but it is slow and you still need to learn Python. So why not just run Python directly. Additionally, Spyder provides an excellent RStudio-esque IDE for Python (although I think it’s Mac only). UPDATE: Spyder apparently works on both Windows and Linux. My bad.
I can’t walk you through the installation of Python and its modules (there’s a huge amount of material already available, and if you use a Mac, I highly recommend the MacPorts installation route). But I can show you what it can do. I developed a Python script to model Lotka-Volterra competition equations. Here’s the annotated script with the figure output, just as a demonstration of the capabilities of Python’s symbolic math package Sympy as an excellent complement to R’s highly advanced numerical techniques. Interested folks can look up Intro to Python material to get a handle on the language, although if you know R, it should be easy enough to figure out. Same goes for those of you familiar with MATLAB. The biggest difference is that when you import a module, you need to type the module name (or nickname if you use ‘as’) before the command so that Python knows where to look for the command. (NOTE: I’ve verified all of Python Sympy’s results with MATLAB. Although they give different equations in the output, they’re identical albeit with some algebraic manipulation).
# Import the necessary modules (like R packages) # LOTKA-VOLTERRA COMPETITION import sympy as sm from scipy import integrate import pylab as py import numpy as np # First, define the variables as symbols (like MATLAB) r1, N1, K1, a12, r2, N2, K2, a21 = sm.symbols('r1, N1, K1, a12, r2, N2, K2, a21') # Define the Lotka-Volterra differential equation symbolically sp1Diff = r1*N1*(1 - (N1 + a12*N2)/K1) sp2Diff = r2*N2*(1 - (N2 + a21*N1)/K2) # Make the equality, setting the equations equal to zero sp1Mod = sm.Eq( sp1Diff, 0 ) sp2Mod = sm.Eq( sp2Diff, 0 ) # Solve each equation for the relevant species (equilibrium values) sp1Eq = sm.solve( sp1Mod, N1 ) sp2Eq = sm.solve( sp2Mod, N2 ) print 'The equilibrium population size for species 1 is\n', sp1Eq print 'The equilibrium population size for species 2 is\n', sp2Eq # Above, we've used sympy to symbolically find the equilibrium # population sizes for both N1 and N2. The code is very straight- # forward and simple (compare to R's symbolic math abilities). # Unfortunately, the equilibria depend on the population size of # the other species. Assume that both species are present. # Make a system of linear equations using the two equilibria, # solving for N1 and N2. sys1 = sm.Eq( K1 - N1 - N2*a12, 0 ) sys2 = sm.Eq( K2 - N2 - N1*a21, 0 ) # Solve this system for N1 and N2, giving the equilibrium # population size when both species are present sysSolve = sm.solve( (sys1, sys2), N1, N2 ) print 'The equilibrium population size for each species in the presence of the other is\n', sysSolve # Next, we use scipy to solve the Lotka-Volterra model numerically # given a set of chosen parameter values (this can be done in R # easily, but we're on a roll and it's also very easy in Python. # Arguably easier, actually. # Note that arrays in Python start with 0, so that K[0] in Python # is equivalent to K[1] in R (K[1] = K[2] in R, etc.). # I prefer R's technique here, but # I'm sure there's a computer-sciencey reason for using zero-based # arrays. def lvComp(N, t, r, K, alpha): return( r[0] * N[0] * (1 - (N[0]+alpha[0]*N[1])/K[0]) , r[1] * N[1] * (1 - (N[1] + alpha[1]*N[2])/K[1]) ) # Now, set the parameter values. You can vary these later r = [0.5, 0.2] N = [2, 2] # Initial pop size K = [100, 50] alpha = [1.2, 0.2] # Set the duration of the model tmax = 100 t = range(0, tmax + 1) # Range goes from the starting point to n-1, n is the end point # Now here it gets complicated! Solve the ode Nt = integrate.odeint( lvComp, N, t, args = (r, K, alpha) ) # Done! # Get the equilibrium pop sizes by plugging in our chosen parameter # values into the equilibria calculated above. This is why I # didn't use R to solve the ODE. I've already calculated the # symbolic equilibrium, and substituting my parameters in is easy. sp1EqLine = sysSolve[N1].subs( [ (K1,K[0]), (K2, K[1]), (a12,alpha[0]), (a21, alpha[1]) ], simultaneous=True) sp2EqLine = sysSolve[N2].subs( [ (K1,K[0]), (K2, K[1]), (a12,alpha[0]), (a21, alpha[1]) ], simultaneous=True) # Make a vector of equilibria plotequils = (sp1EqLine, sp2EqLine) # Plot the ode results py.plot(t, Nt, '--o') py.xlabel('time (t)') py.ylabel('N(t)') py.legend(['Sp1', 'Sp2'], loc='lower right') py.title('Lotka-Volterra Competition Between Two Species') for line in plotequils: py.axhline(y=line, linewidth=2, color='red', ls='--')
# To print zero-growth isoclines, first isolate the equilibrium # equation for each species sp1IsoEq = sp1Eq[1] sp2IsoEq = sp2Eq[1] # We're plotting a phase-plane graph with species 2 on the y-axis # so we need to re-arrange the species 1 equation to predict N2 # based on N1 (it's currently the other way around) sp1IsoEq = sm.Eq(sp1IsoEq, N1) sp1IsoEq = sm.solve(sp1IsoEq, N2) # Get the range of N1 on the x-axis. Depending on the parameter # values, the max can either be K1, or K2/a21 n1Range = range(0, int( max(K[0], K[1]/alpha[1]) + 5)) # Calculate the zero-growth isocline for species 1 n1Iso = [] #makes an empty list for n in n1Range: a = sp1IsoEq[0].subs([(K1,K[0]),(N1,n1Range[n]),(a12,alpha[0])], simultaneous=True) n1Iso.append(a) # Calculte the zero-growth isocline for species 1 n2Iso = [] for n in n1Range: a = sp2IsoEq.subs( [ (K2,K[1]), (N1, n1Range[n]), (a21, alpha[1]) ], simultaneous=True ) n2Iso.append(a) # Plot the results, showing the model output overtop py.clf() py.plot(n1Range, n1Iso, '--') py.plot(n1Range, n2Iso, '--') py.plot(Nt[:,0], Nt[:,1], 'b^', label='Model Output') py.axis( [0, max(K[0], K[1]/alpha[1])+0.1*max(K[0], K[1]/alpha[1]), 0, max(K[1], K[0]/alpha[0])+0.1*max(K[1], K[0]/alpha[0])] ) py.xlabel('N1') py.ylabel('N2') py.legend( ['N1', 'N2', 'Model Output'], loc='upper right' ) # Get text value py.text(0.5, K[1], 'K2', va='center') py.text(0.5, K[0]/alpha[0], 'K1/a12', va='center') py.text(K[0], 0.5, 'K1', ha='center') py.text(K[1]/alpha[1], 0.5, 'K2/a21', ha='center')
Normally, we could stop here. But we’ll go one further and plot a vector field (something else python makes easy)
# First, set up a grid of N1 and N2 values (similar to # expand.grid() in R) X,Y = np.meshgrid( np.arange(0, K[0] + 0.1*K[0], 5), # 0 - K1, every 5th number np.arange(0, K[1] + 0.1*K[1], 5) ) # Python wants a starting X,Y position (above) and vectors of # of change along the X and Y axes (U,V). These are what the # differential equations describe, so we can just use the very # first equations over every element in X and Y U = r[0] * X * (1 - (X + alpha[0]*Y)/K[0]) V = r[1] * Y * (1 - (Y + alpha[1]*X)/K[1]) # plot the results py.clf() py.plot(Nt[:,0], Nt[:,1], 'go') Q = py.quiver(X,Y,U,V) py.plot(sp1EqLine, sp2EqLine, 'b*', ms=15) py.plot(0, K[1], 'b*', ms=15) py.plot(K[0], 0, 'b*', ms=15) py.axis( [-1, K[0] + 0.1*K[0], -1, K[1] + 0.1*K[1] ] ) py.xlabel('N1') py.ylabel('N2') py.title('Vector Field of Population Trajectories')
I can’t advocate Python over R for statistics, R is too well defined, has too many functions, and has much better model formulation (to name a very few of the long list of reasons R is better for stats). But hopefully I’ve convinced you that Python is a great platform for symbolic math, calculus, and modeling that can complement R. Together, R and Python make the world’s best open-source mathematical software combination. Also, as with R, if you want it, there’s a Python module for it (GIS, GPS tracking, astronomical predictions, physics, etc). Also, Python is a high-level programming languaging, meaning that it’s possible to integrate your math with other Python modules and programs meant for things like information gathering from websites, etc.
(NOTE: Sage exists as an open-source alternative to MATLAB, Maple, and Mathematica combined and is excellent, but it’s largely built on Python and uses many of the same modules. I’m not sure where Sage extends the capabilities of Numpy and Scipy, but definitely check it out. It’s possible that Sage expands on Numpy, Scipy, and Sympy)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.