Integrating Documentation and Calculation
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Integrating Documentation and Calculation
This post is a first in that I've authored it using RStudio. I would guess most people who work in computational finance or quantitative risk are at least familiar with R. Unfortunately R as it existed until two years ago was borderline unusable. [Queue a storm of protest.] The fact is that the standard interfaces to R are stone-age compared with comparable (and much more expensive) systems like SAS or MATLAB. RStudio changes that. Syntax highlighting, intellisense-type prompting, version control and, crucially in my eyes, support for mastering the packaging system (R's closest equivalent to compiling a dll). RStudio transforms R from being a system for those with dedication and time to spare, into something useable by busy practitioners and academics. If it sounds as if I'm gushing, try it. Thank me later for changing your life.
Amongst the features of RStudio is the ability to embed R code into documentation, or perhaps the other way around. The developers offer several technologies here. Sweave is fairly venerable and allows one to weave (you see what they did there?) R (or S) code into LaTeX documents. Now I used to like LaTeX, but I find it pretty heavyweight, except when it comes to mathematics. Fortunately one also has the choice of embedding R in HTML directly or, as I am doing now, using Markdown. Markdown is a very lightweight markup language. You can specify headings, lists, italics, bold, and a couple of other type-settings. The idea is that you focus on content, not presentation.
Now the interesting parts are the integration with LaTeX-style mathematics (via MathJAX and the ability to embed R code in documents.
LaTeX style equations
As an example, I could write the PMF of a binomial random variable S as
\( P(S=k) = { n \choose k } p^k (1-p)^{n-k} \)
In order to get this attractively type-set equation I wrote
\( P(S=k) = { n \choose k } p^k (1-p)^{n-k} \)
which LaTeX users will recognise immediately. I find LaTeX equations easy to write, certainly easier than using the equation editor embedded in MS Word.
Embedding code
Let's say I now want to evaluate the probability that my binomial random variable \( S \) shows \( k=5 \) successes in \( n=10 \) trials where the probability of success is \( p=0.5 \). I can do the following:
k <- 5 n <- 10 p <- 0.5 paste("P[S=k] = ", pbinom(k, n, p), collapse = "") [1] "P[S=k] = 0.623046875"
When I compile the document (which I do by clicking a button in RStudio) the code in the grey block above is evaluated and generates the output in the white block. I've (mostly) left it on default settings here, there are various options to make this pretty, including hiding the code-block and just showing the output.
Right now I'm using this feature mostly as a matter of convenience - I can do the calculation right here instead of opening up R or Excel, doing it there and pasting the result back into this document. But serious applications are obvious. For example, one could write a paper involving analysis of some data. The paper could be published as usual, with the original “source” of the paper available for download. Interested parties could examine the calculations and rerun them immediately. This kind of “reproducible research” is topical at the moment in the fallout from the Roggoff and Reinhard controversy.
Streamlining Reporting
Closer to home, consider what most reporting and regulatory work at insurers and banks consists of: we run some calculations, maybe do some formatting of the results in Excel, copy them into Word or Powerpoint and send them to management / the regulator / shareholders, maybe with a final run through a PDF creator first. At each stage we need to make sure that we have the correct set of results, that the movement of numbers through different systems has succeeded properly, and that the accompanying narrative matches the results.
The tools I'm using here have the potential to simplify and stream-line this whole process. One could build a suite of templates with embedded calculations linked to policy and asset position databases, running the calculations in the same place as they are described. Change a reference date and recompile the document and the numbers update appropriately.
For a client I've been experimenting with reporting templates, using Excel to build the templates and having an add-in which pulls calculation results from a database into the spreadsheets. It works, but it's a little awkward to write large chunks of text in a spreadsheet. A better solution would be to use RStudio / RMarkdown and pull calculation results from the database using some custom R code involving the RODBC library. Unfortunately the client is a Microsoft house and convincing them to adopt R has proved challenging.
I'll conclude by encouraging you to download RStudio and have a look. If you currently use R, or you've looked at it in the past and been put off, I think you'l be impressed. In my next post we'll do some maths and run some numbers.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.