Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Over the past weeks I have tried to replicate the figures in Lattice: Multivariate Data Visualization with R using Hadley Wickham’s ggplot2.
With the exception of a few graph types (e.g. ggplot2 doesn’t support 3d-graphs, and there were a few other cases), it was possible to create ggplot2 versions of almost all the figures. Sometimes this required data manipulation before plotting in order to get data into a suitable form to feed into ggplot2, but more often than not ggplot2 provided satisfactory out-of-the-box visualisation very closely comparable to that of lattice.
I would like to conclude this series with comments on a few keywords that stuck to my mind while preparing all these graphs.
Speed
Both lattice and ggplot2 are running on top of the grid graphics, however lattice is a lot faster. A lot. Whilst drawing one or two graphs, one might not even notice the difference in speed, but once the number of graphs increases or the datasets get bigger the relative slowness of ggplot2 becomes more clearly recognisable (have a look at the pdf-s linked to at the end of this post for comparative timings).
Reader Ben Bolker emailed Hadley Wickham about the issue, and the response he got was “So far I have been completely focused on functionality, and not at all on speed. I would really like to spend some time profiling and optimised ggplot2 (I suspect an order of magnitude speed increase would be possible), but unfortunately my summer is filling up rapidly and I am feeling some pressure to write papers rather than (more) code.”
It is good news that speed can be improved, now let’s hope Hadley finds some time to look into this.
Output Customisation
Almost every element of the output of both packages is highly customisable. lattice has more options to tinker with the finest details of the plot, allowing to make sure that the final graph looks exactly the way one wants. Such fine-tuning requires, though, a very good knowledge of the inner workings of the program, as the available options are not always so obvious. I find fine-tuning a graph using the ggplot2’s approach a lot easier, as it is clear which element of the plot is being adjusted.
Still, as always, there is room for improvement – the ability to better manipulate the heights/widths/aspect ratios of facets (facet_grid has the space="fixed" argument, but not facet_wrap); and better control over size and positioning of legends are the two main items that surfaced during this exercise.
Syntax
As already mentioned lattice has a jungle of parameters one can manipulate to achieve the best output possible. Lattice to me is more cluttered with all of its rich options (panel/prepanel functions), and I personally prefer the ggplot2 approach of building up a graph layer by layer using “human-readable” expressions. Compared to the use of various specialised functions in lattice I find this more intuitive and easier to follow.
The lattice panel functions in capable hands make it an extremely powerful tool. However, having seen the lattice examples, only now did I come to fully appreciate the power of the ggplot2 equivalents: stat_summary and stat_function.
Again, if there was one thing to add to my wish-list, it would be the ability to use formulas/functions (e.g. reorder) as facetting variables – allowing to skip one data pre-processing step.
Documentation
ggplot2 has a very good website with many useful examples (the same information without the rendered graphs is included in the help file), as well as a book with good explanations. Using a combination of all these, one gets a good overview of the available options, and answers to the questions that may arise. I especially like the examples on the website, that often highlight the more intricate features of the program.
lattice manual explains all the available options in great detail, sometimes requiring a good amount of concentration and will to go through the instructions. Apart from the book website, one can also make use of R Graphical Manual that includes “a collection of graphics from all R packages”.
Pdf-version of the posts
Some readers requested a pdf-version of the posts – all the chapters have been compiled into one pdf-file that can be downloaded here (6mb).
Another version of the same file which also includes the system.time results for most of the print statements used to generate the images is available here (6mb).
And yet another version with no images can be downloaded here (800 kb).
Tools
I will also list the tools I used to create the blog posts as well as the pdf-files:
- asciidoc – a text document format for writing short documents, articles, books and UNIX man pages. AsciiDoc files can be translated to HTML and DocBook markups.
- ascii – ascii is an R package that replaces R results in AsciiDoc document with AsciiDoc markup.
- blogpost – a command-line weblog client for publishing AsciiDoc documents to WordPress blog hosts. It creates and updates weblog posts and pages directly from AsciiDoc source documents.
- dblatex – PDF output was generated by passing AsciiDoc generated DocBook through dblatex/pdftex.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.