Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In a previous post I demonstrated how to use R’s simple built-in symbolic engine to generate Jacobian and (pseudo)-Hessian matrices that make non-linear optimization perform much more efficiently. Another related application is Gaussian error propagation.
Say you have data from a set of measurements in variables x
and y
where you know the corresponding measurement errors (dx
and dy
, typically the standard deviation or error from a set of replicates or a calibration curve). Next you want to create a derived value defined by an arbitrary function z = f(x,y)
. What would the corresponding error in value of z
, i.e. dz = df
, be?
If the function f(x,y)
is a simple sum or product, their are simple equations for determining df
. However, if f(x,y)
is something more complex, like:
you’ll need to use a bit of calculus, specifically the chain rule:
Applying the above equation allows for the derivation of Gaussian error propagation for any arbitrary function. So how does one do this in R? Again, the D()
function and R expression()
objects come to our rescue.
Say the definition of z (ergo f(x,y)
) is defined in an R formula
:
> f = z ~ (x-y)/(x+y)^2
If you probe the structure of a formula object you get:
> str(f) Class 'formula' length 3 z ~ (x - y)/(x + y)^2 ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
What’s key is the “length 3
” bit:
> f[[1]]; f[[2]]; f[[3]] `~` z (x - y)/(x + y)^2
The code above shows us that a formula
object can be subsetted into its constituent parts:
- the formula operator:
~
- the left-hand side (LHS) of the formula:
z
- the right-hand side (RHS) of the formula:
(x - y)/(x + y)^2
The class()
of the RHS is a call
, which is close enough to an R expression
that both all.vars()
and D()
work as expected to generate the mathematical expressions for the partial derivatives with respect to each variable:
> all.vars(f[[3]]) [1] "x" "y" > lapply(all.vars(f[[3]]), function(v) D(f[[3]], v)) [[1]] 1/(x + y)^2 - (x - y) * (2 * (x + y))/((x + y)^2)^2 [[2]] -(1/(x + y)^2 + (x - y) * (2 * (x + y))/((x + y)^2)^2)
These expressions need to be modified a bit – i.e. in this case they need to be multiplied by dx
and dy
, respectively and then squared. What’s returned from D()
is a call
object, so the elements above need to be converted to character
to manipulate them accordingly. This is done with deparse()
.
> lapply(all.vars(f[[3]]), function(v) deparse(D(f[[3]], v))) [[1]] [1] "1/(x + y)^2 - (x - y) * (2 * (x + y))/((x + y)^2)^2" [[2]] [1] "-(1/(x + y)^2 + (x - y) * (2 * (x + y))/((x + y)^2)^2)"
The final error propagation expression is created with a bit of string manipulation:
> sprintf('sqrt(%s)', paste( sapply(all.vars(f[[3]]), function(v) { sprintf('(d%s*(%s))^2', v, deparse(D(f[[3]], v))) }), collapse='+' ) ) [1] "sqrt((dx*(1/(x + y)^2 - (x - y) * (2 * (x + y))/((x + y)^2)^2))^2+(dy*(-(1/(x + y)^2 + (x - y) * (2 * (x + y))/((x + y)^2)^2)))^2)"
Now that we’ve got the basics down, let’s test this out with some data …
> set.seed(0) > data = data.frame( x = runif(5), y = runif(5), dx = runif(5)/10, dy = runif(5)/10 ) > data x y dx dy 1 0.8966972 0.2016819 0.006178627 0.07698414 2 0.2655087 0.8983897 0.020597457 0.04976992 3 0.3721239 0.9446753 0.017655675 0.07176185 4 0.5728534 0.6607978 0.068702285 0.09919061 5 0.9082078 0.6291140 0.038410372 0.03800352
and with a little help from dplyr
:
> library(dplyr) > data %>% + mutate_(.dots=list( + # compute derived value + z = deparse(f[[3]]), + + # generates a mathematical expression to compute dz + # as a character string + dz = sapply(all.vars(f[[3]]), function(v) { + dfdp = deparse(D(f[[3]], v)) + sprintf('(d%s*(%s))^2', v, dfdp) + }) %>% + paste(collapse='+') %>% + sprintf('sqrt(%s)', .) + )) x y dx dy z dz 1 0.8966972 0.2016819 0.006178627 0.07698414 0.57608929 0.14457245 2 0.2655087 0.8983897 0.020597457 0.04976992 -0.46718831 0.03190297 3 0.3721239 0.9446753 0.017655675 0.07176185 -0.33019871 0.01978697 4 0.5728534 0.6607978 0.068702285 0.09919061 -0.05778613 0.07604809 5 0.9082078 0.6291140 0.038410372 0.03800352 0.11809201 0.02424023
Taking this a step further, this method can be wrapped in a chainable function that determines the name of new variables from the LHS of a formula argument:
mutate_with_error = function(.data, f) { exprs = list( # expression to compute new variable values deparse(f[[3]]), # expression to compute new variable errors sapply(all.vars(f[[3]]), function(v) { dfdp = deparse(D(f[[3]], v)) sprintf('(d%s*(%s))^2', v, dfdp) }) %>% paste(collapse='+') %>% sprintf('sqrt(%s)', .) ) names(exprs) = c( deparse(f[[2]]), sprintf('d%s', deparse(f[[2]])) ) .data %>% # the standard evaluation alternative of mutate() mutate_(.dots=exprs) }
Thus, adding new derived variables and propagating errors accordingly becomes relatively easy:
> set.seed(0) > data = data.frame(x=runif(5), y=runif(5), dx=runif(5)/10, dy=runif(5)/10) > data x y dx dy 1 0.8966972 0.2016819 0.006178627 0.07698414 2 0.2655087 0.8983897 0.020597457 0.04976992 3 0.3721239 0.9446753 0.017655675 0.07176185 4 0.5728534 0.6607978 0.068702285 0.09919061 5 0.9082078 0.6291140 0.038410372 0.03800352 > data %>% mutate_with_error(z ~ (x-y)/(x+y)^2) x y dx dy z dz 1 0.8966972 0.2016819 0.006178627 0.07698414 0.57608929 0.14457245 2 0.2655087 0.8983897 0.020597457 0.04976992 -0.46718831 0.03190297 3 0.3721239 0.9446753 0.017655675 0.07176185 -0.33019871 0.01978697 4 0.5728534 0.6607978 0.068702285 0.09919061 -0.05778613 0.07604809 5 0.9082078 0.6291140 0.038410372 0.03800352 0.11809201 0.02424023
Written with StackEdit.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.