Site icon R-bloggers

I will survive!

[This article was first published on Gianluca Baio's blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Here’s a very long post, to make up for the recent silence on the blog… Lately, I’ve been working on a new project involving the use of survival analysis data and results, specifically for health economic evaluation (cue Cake’s rendition below).


I have to say I’m not really a massive expert on survival analysis, in the sense that it’s never been my main area of interest/research. But I think the particular case of cost-effectiveness modelling is actually very interesting $-$ the main point is that unlike in a standard trial, where the observed data are used to determine some median estimate of the survival curve (typically across the different treatment arms), in health economic evaluations the target quantity is actually the mean survival time(s), because these are then usually used to extrapolate the (limited!) trial data into a full decision-analytic model, often covering a relatively large time horizon. Among many, many others , I think Chris et al make a very good point for this, here.

Anyway, one of the main implications to this is that typically the practitioners are left with the task of fitting a (range of) parametric survival model(s) to their data. Nick Latimer among others have done excellent work in suggesting suitable guidelines. (In fact, both Chris and Nick did come to talk to one of our workshops/seminars, last summer).

Over and above the necessary choice of models, I think there are other interesting issues/challenges for the health economic modeller:

  1. (Parametric) Survival models are often tricky because there are many different parameterisations, leading to different results presentations. This can be very confusing and without extra care lead to disastrous consequences (because the economic model extrapolates on the wrong survival curves!).
  2. Even when the parameterisation is taken care of, we are normally interested in characterising the full uncertainty in the joint distribution of the survival model parameters $-$ we need to do this to perform Probabilistic Sensitivity Analysis (PSA), so even in a non-Bayesian model, this is a required output of the analysis. Pragmatically, this means computing a survival curve for a large combination of parameter values and feeding each to the economic model to assess the impact of uncertainty on the final decision-making process.
  3. Much to my frustration (and, I realise, to the frustration of the people I keep nagging about this!), the economic models are (too) often performed in Excel. This means that while the survival analysis is done externally in a proper statistical software, then the results (usually in tabular form) are copied over the spreadsheet and used to then construct the survival curves (eg via VBA macros).
I think the process is complex enough and after talking to some health economist colleagues, I’ve started to work on a R package to try and standardise and simplify it as much as possible. In fact, this will be more of a wrapper for several other R packages, but I’m planning on including some nice (I would say that, wouldn’t I?) features.

Firstly, the idea is to allow the user to fit survival models using MLE as well as a Bayesian approach. In the former case (which I will reluctantly set as the default $-$ for reasons I’ll explain later), survHE, which is how I’ve called the package, will just remotely call flexsurv, which is a (very clever!) wrapper itself. flexsurv allows the user to get bootstrap simulations from the joint distribution of the relevant parameters, which are used for the PSA problem. Also (here’s the reason I referred to earlier), the range of models that it can fit is wide enough to cover those suggested by the NICE guidelines $-$ in fact even more, probably.

For the Bayesian analysis, I’m allowing the user to either use the (not-so-wide) range of in-built models in INLA, or to go fully MCMC and use OpenBUGS (which offers the same range as flexsurv). Of course, one of the crucial features of going fully Bayesian on this is that the posterior joint distribution of the model parameters can be fully characterised (using the new inla.posterior.sample function in INLA, or directly from the MCMC simulations in OpenBUGS) and so the uncertainty in the survival curves can be propagated directly in the economic model.

In terms of running time, INLA is basically just as fast as the MLE, while MCMC is generally slower (and, it is known to possibly run into convergence problems for some parameterisations). 

The typical call to survHE will be something like this

fit <- fit.models(formula, data, distr, method, …) 
where 
  • formula can be specified using standard R notation, something like Surv(time,event)~as.factor(arm), for MLE analysis or inla.surv(time,event)~as.factor(arm), for INLA. I think I’ve managed to make the function clever enough to recognise which formula should be used depending on what method of inference is specified and also to figure out how to translate this into BUGS language.
  • data is shockingly the dataset to be used.
  • distr is a (vector) of string(s) indicating which parametric distribution(s) should be fitted to the data, something like distr=c(“exponential”,”weibull”). Again, to make the modellers’ lives easier, I’ve kind of made mine miserable and tried to be very clever in accounting for differences in terminology across the three packages/approaches I cater for.
  • method is a string specifying what kind of analysis should be done, with values at “mle”, “inla” or “mcmc”.
The other options are mainly to do with INLA & BUGS $-$ for example, in the latter case, the user can specify the number of simulations to be run. If the extra argument n.iter is set to 0, then survHE will not run the model, but simply prepare and save on the user’s current directory the BUGS code associated with the assumptions encoded in the call, and create the data list, the vector of parameters to be monitored and a function creating the initial values in the R workspace. In this way, the user can fiddle with the template model.

Alongside the main fit.models function, I have (or I am) prepared (preparing) a set of other functions for plotting, printing and, most importantly, exporting the results (eg of the PSA) for instance to an Excel file. The idea is that in this way, all the funny business of doing part of the survival analysis in a statistical software and then completing the computation of the survival curves for different values of the parameters in Excel can be avoided $-$ which seems to me a very valuable objective!

I’ll post more on this when I’m closer to a beta-relase $-$ I’m also trying to prepare a structured manual to accompany the package. 

To leave a comment for the author, please follow the link and comment on their blog: Gianluca Baio's blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.