Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Abstract
We present experiences from organizing a small R hackathon aimed at advancing knowledge and documentation of the R package surveillance. The hackathon was piggybacked on the ESCAIDE2016 conference visited by current and potential package users in the area of infectious disease epidemiology. The output of the hackathon is available at https://surveillancer.github.io/tutorials/.
Introduction
A hackathon is a extreme-programming sprint-like event where people involved in software development (and beyond) meet for a short period of time with the purpose of collaborative programming, typically in open-source. The word hackathon is a merger of hack and marathon, where hacking is to be understood as the skillful modification of computer programs (and not the malicious circumvention of security measures). Lots of good guides have been written on how to run a successful Hackathon. In the area of infectious disease epidemiology, which has been the main area of motivation for our statistical developments and implementations, very successful events (hackout, hackout2, hackout3) have previously been organized. At a much smaller scale we wanted to ignite some of the energy and enthusiasm such an event spawns.
As a consequence, this blog post gathers our experiences from organising a small 4×3 hackathon (4 people, 3 days) for the surveillance R-package in connection with the ESCAIDE2016 conference. Our hope is that these might be useful for others – possibly working in very different contexts.
Organizing and Running the Hackathon
We report on a number of practical "?" and "!" below.
Why?
Over the last several years we worked on a package for the visualization, modelling and monitoring of surveillance time series. As most of us are busy with other tasks now, it felt like a good idea to all meet in person to work a little on the package and network with potential users in order to increase awareness of the package. The 10’th European Scientific Conference on Applied Infectious Disease Epidemiology (ESCAIDE2016) organized by the European Centre for Disease Prevention and Control (ECDC) in Stockholm 28-30 Nov 2016 in Stockholm Sweden with its about 600 participants, felt like the right place to be.
A Cool Name?
Stockholm is placed on the 59th parallel north, hence, during end of November daylight is limited to approximately 6:30 hours. In other words: Perfect hacking conditions. To honour this Hack in the Dark became our internal handle for the hackathon.
Who?
Hackathons come in all sizes. We decided on a mini four person hackathon of experienced R users, who all knew the package well: two former Ph.D. students who had used the package as the implementational repository for their methodological developments (one of them now being the package maintainer), a former power-user of the package and the package creator. It might have been good to involve new persons in order to expand awareness of the package and increase diversity of the hackathonians, but we decided to go for the small team in order to maximize efficiency.
Venue?
Piggybacking on an existing conference held in a large conference centre meant we did not have to worry about WLAN, food and seating. Especially when only being 4 persons.
What to do?
We started about 8 weeks before the hackathon to brainstorm using slack as a messaging system. We then created a priority matrix in google docs allowing each of the participants to prioritize the ideas. This gave us an initial idea of what we wanted to do. Unfortunately, most of us then got pretty busy with other activities so we never managed to revisit the matrix until a couple of days before the start of the hackathon. Instead, we recapitulated matters in an "indian buffet process" meeting in Stockholm at the night before the hackathon:
- Write i/o tutorials explaining how to get data into the package and then use package functions for visualization on them
- Use open European data for the tutorials – the theme of ESCAIDE2016 was after all: Data for action
- Make a shiny app to visualize the effect of various parameters choices for the surveillance algorithms implemented in the package. The best choice of configuration has been a recurrent user question throughout the years.
- Demo twitter surveillance by monitoring the conference hashtag
#ESCAIDE2016
.
Source Code Management?
We chose to create an organization surveillancer on github, which we then all joined with our individual github accounts. All new projects were then conducted by initiating new repositories. The surveillance
package itself is still developed on R-Forge using svn
, but since we knew most of our work would be using the surveillance package rather than developing the package, we decided to keep the existing infrastructure for the package and instead develop the planned tutorials and visualizations in a new github project. This worked ok, but switching between svn
and git
for commits on different projects was not always helpful: git commit -a
is a useful friend if you don’t know the git staging area…
Project Output Format?
We decided to create R package structure and use Hadley Wickham’s new pkgdown package to create website containing all hackathon output. The tutorials were then created as vignettes. An immediate advantage of this approach was that all vignette code was then directly available for the interested user.
Demo, demo, demo!
Inspired by the extreme programming paradigm, and because we wanted to interact with the conference, we decided to demo at least once a day by posting hackathon output on twitter. Besides the outgoing publicity we also frequently demo’ed internally in order to get input and suggestions. This worked pretty well – there is nothing as motivating and interactive as getting constructive input and suggestions from your table neighbour!
Interaction with the other conference participants, on the other hand, was moderate. We showed parts and pieces to interested people, but in hindsight we should have aimed for a poster presentation or a related activity in order to generate more real-life awareness of the hackathon outside the virtual world of twitter.
Summing Up
The three days of hackathon passed quickly, but we managed to get the four formulated outputs done.
Reflections
Intense software sprints are hard work, thus, it was natural that towards the end of the hackathon the concentration decreased slightly. However, phases of intense coding are perfectly supplemented by listening to scientific talks, talking to former colleagues or share the passion of R with others working in the field. In particular it was nice to exchange ideas with Thibaut Jombart who’s R Epidemics Consortium (RECON) project hopefully is able to bundle the R initiatives in infectious disease epidemiology a little more. Besides the availability of software, the training aspect of new users (e.g. ongoing epidemiologists) is also crucial. Finally, the aftermath of the hackathon is as important as the pre-event planning: One or more dedicated coordinators have to ensure that loose ends are wrapped up. Here, the participants’ enthusiasm declines quickly as other activities become higher priorities. Again, it’s valuable to have a clear goal of what needs to be done (e.g. a blog entry…).
Altogether, code sessions such as a reproducibility session or a toolbox session could be components for spicing up scientific conferences. For example the toolbox event could consist of a set of interested people, who meet up on conference day one to implement a particular method useful in practice and then demo it on the last day. Of course this all takes time away from other conference content and might be more work than dozing of in the plenary sessions…
No matter what, focus of a hackathon should also be on social aspects. It also proves wise not to ignore fresh air & sunlight completely. Actually, to our surprise, the 6:30 hours of daylight were actually quite sunny in Stockholm!
Visit the Hack in the Dark Output
The output of the hackathon can be found here:
In order to run the accompanying code (available from github by clicking on the "fork me on github" icons), version 1.12.3 of the surveillance
package is needed. Currently, this development version of the package is not on CRAN, but can be installed from R-Forge as
install.packages("surveillance", repos="http://R-Forge.R-project.org")
As an appetizer to actually check out the site or the code, here are two of our tweets demoing the output:
First draft output of the #ESCAIDE2016 surveillance #rstats hackathon: Visualizing #opendata by (???) on Salmonella Agona. #data4action pic.twitter.com/8ApaDNF07L
— Michael Höhle (@m_hoehle) November 28, 2016
Interactive illustration of monitoring algorithms for infectious disease surveillance #escaide2016 #rstats https://t.co/qizcqqMbEJ pic.twitter.com/LyZujDLsF2
— Dirk Schumacher (@dirk_sch) November 29, 2016
The Future
We wish you all the best for your hackathon event. Put software on the scientific agenda!
Acknowledgments
Thanks to Maëlle Salmon, Dirk Schumacher and Sebastian Meyer for all their great work and the creative atmosphere during the hackathon! The event was implicitly supported by the Swedish Research Council as part of the project Statistical Modelling, Monitoring and Predictive Analytics against Infectious Disease Outbreaks (grant number 2015-05182_VR).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.