Site icon R-bloggers

Git, peer review, tests and toil by @ellis2013nz

[This article was first published on free range statistics - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This week I was in Auckland New Zealand to deliver the third and final of the 2024 series of the Ihaka Lectures, named after legendary denizen of University of Auckland’s statistics department Ross Ihaka, one of the two co-founders of the statistical computing language R.

I have added links to the video of the talk (it was live-streamed), my slides, and the ‘storyline’ summary I used to help me structure the talk to my mostly-neglected presentations page on this blog.

Here is perhaps the key image from the talk, a slide showing an all-purpose workflow for an analytical project, drawing on a large and persistent data warehouse, plus project specific data, and having a deliberate processing stage to combine the two into an analysis-ready “project-specific database”. I’ve been using variants of this diagram for more than 10 years now, and it will be familiar to anyone from my days with New Zealand’s Ministry of Business, Innovation and Employment, or international management consultancy Nous Group.

Overall, I emphasised the importance of R being part of a broader toolkit and a broader transformation – with Git and SQL the two non-negotiable must-have partners to successfully make R work in government.

I also talked a bit about how errors in analysis are universal, invisible, and catastrophic. If that doesn’t motivate people to start doing some decent quality control, I don’t know what will!

The ‘storyline’ is a great technique I was trained on in a course on writing for the New Zealand public sector. I always find it helps to structure reports and presentations if I take the time to plan them first. In case people are interested in making their own summaries of this sort, you could use the RMarkdown source code of that storyline, which of course is available in GitHub (or I’d be a bit of a hypocrite wouldn’t I). It uses the flexdashboard template. Of course a storyline doesn’t need to be written in RMarkdown, but I find it a simple and disciplined way to write them without having to worry about formatting.

Big thanks to the University of Auckland Department of Statistics and all the good folks there for inviting me to give this talk and looking after me so nicely while in New Zealand.

To leave a comment for the author, please follow the link and comment on their blog: free range statistics - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version