Interoperability was a key theme in open-source data languages in 2023. Ongoing innovations in
Arrow (a language-agnostic in-memory standard for data storage), growing adoption of
Quarto (the language-agnostic heir apparent to R Markdown), and even p... [Read more...]
Language interoperability and different ways of enabling “polyglot” workflows have seemed to take centerstage in the data world recently:
Apache Arrow promises a language-independent memory format for interoperability, -
RStudio its ... [Read more...]
Shiny apps are R’s answer to building interface-driven applications that help expose important data, metrics, algorithms, and more with end-users. However, the more interesting work that your Shiny app allows users to do, the more likely users ar...
Last year, I tweeted about how to spread holiday cheer by letting your R Markdown documents snow. After all, what better to put people in the holiday spirit than to add a random 5% probability that whatever part of a document they are trying to read wi... [Read more...]
Data quality monitoring is an essential part of any data analysis or business intelligence workflow. As such, an increasing number of promising tools1 have emerged as part of the
Modern Data Stack to offer better orchestration, testing, and reporting....
Simple, self-contained, reproducible examples are a common part of good software documentation. However, in the spirit of brevity, these examples often do not demonstrate the most sustainable or flexible workflows for integrating software tools into large projects. In this post, I document a few mundane but useful patterns for querying ... [Read more...]
Note: this post is a written version of my
rstudio::global 2020 talk on the same topic. Please see the link for the slides and video version. I do elaborate on a few points here that I cut from the talk; if you’ve already watched the talk and ju...
Declarative programming languages such as HTML, CSS, and SQL are popular because they allow users to focus more on the desired outcome than the exact computational steps required to achieve that outcome. This can increase efficiency and code readabili... [Read more...]
Back in September, I wrote about how
controlled vocabularies can help form contracts between data producers and consumers. In short, I argued that aligning on an ontology of stub names for use naming variables in a dataset can improve data documentati...
This week, I was pleased to become an official
RStudio Certified Instructor after completing Greg Wilson’s training program, based on his excellent book
Teaching Tech Together. Part of the certification process involved teaching a 10 minute les...
Software products use a range of strategies to make promises or contracts with their users. Mature code packages and APIs document expected inputs and outputs, check adherence with unit tests, and transparently report code coverage. Programs with graphical user interfaces form such contracts by labeling and illustrating interactive components to ...
Recently, I argued the case on Twitter that Shiny modules are not an advanced topic and can actually be a great way for novice Shiny developers to start building more complex applications.
My Shiny hot take is that modules are **not** an advanced topic. IMHO it's so much easier and ...
The importance of documentation is uncontroversial. For many data and analytical products, documentation is the user interface and key to promoting user success and future reuse. However, when project timelines get tight, too many data products are considered complete without appropriate documentation. Even when these resources initially exist, they too ... [Read more...]
When working with R Markdown’s HTML output type, it’s possible to add a custom style to your output by passing in a CSS style sheet to the YAML header like this:
output:
html_document:
css: "my-style-sheet.css"
To use CSS effectively, it’s critical to understand how to specificy which selectors one wishes ...
Many tools and packages aim to eliminate the pain and uncertainty of technical project management. For example, git, make, Docker, renv, and drake are just a few existing tools that enable collaboration, manage softwatre dependencies, and promote reproducibility. However, there is no analogous gold standard for managing the most time-consuming ...
Motivation
My initial post on
RMarkdown Driven Development focuses on major concepts in the process of evolving a one-time, single-file analysis into a sustainable analytical tool. In the spirit of Etsy’s
immutable documentation, I intentionally minimized references to specific tools or packages. After all, software is transient; principles are ...
One of the ways that practices of reproducible research can be brought into industry is through the development of custom R packages and data tools for one’s company / organization. Not only can these tools deliver large efficiency gains and standardization, they ideally infuse corporate culture with the shared passion ... [Read more...]
Last winter, I attended a holiday party at a “paint-and-sip” venue. For those unfamiliar, “paint-and-sip” is a semi-trendy cottage industry offering evenings of music, wine, and a guided painting activity. For example, my group painted sasquatch on a snowy winter’s eve:
As often happens, this completely unrelated thing set ...
Introduction
RMarkdown is an excellent platform for capturing narrative analysis and code to create reproducible reports,
blogs,
slides,
books, and more. One benefit of RMarkdown is its abilities to keep an analyst in the “flow” of their work and to capture their thought process along the way. However, thought processes ...