The Power Of Transitioning To A ‘-Verse’ Approach In R Package Development
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The word “universe” can mean many different things. In modern-day entertainment, we are conditioned to think of it as related movies or media tied together by the tried-and-true formula of post-credits scenes and character crossovers.
Speaking of entertainment, read more about how we helped a Fortune 500 movie studio with their digital transformation process.
In terms of R packages, this would mean packages that share a common thread and can be used together (for example, a common grammatical syntax) to execute certain functions towards an end goal. Notable examples include the Tidyverse set of packages or, if you’re more familiar with the pharmaceutical industry, the Pharmaverse.
Challenges In Creating A Package Universe
Adding a “-verse” to your set of packages isn’t as simple as just adding that suffix and calling it a day. Many packages are developed independently, serving a specific use-case and wouldn’t work as straightforwardly when paired with another package that was developed in a completely different direction. If you already have a plethora of packages developed in-house for your teams and want to make them work towards being more maintainable, efficient, scalable, and most importantly, cost-avoidant in the long run, then the next step is to create a universe of packages of your own.
Not sure whether this is a good leap forward? Let’s take a look at a real-life success story of how creating a shared universe actually benefits an ever growing project…
Real-Life Success Story: The ElkemVerse Effect
We worked with Elkem on creating their elkemVerse to streamline their data-application development processes. In the beginning, Elkem had only one large package that did everything from setting up database connections to generating plots and shiny dashboards. This was, in turn, used across their nine production sites across the globe in their development pipeline.
With just one tightly knit and dedicated development team, this might not be an issue but with scaling up and expansion, a paradigm shift naturally occurs when a development team begins to expand (either via new hires or external contractors) to serve newer projects. Having a single monolithic package would only hinder future development, and this was how the notion of creating a universe of packages from the original all-in-one package actually made sense in both business and technical standpoints, i.e.:
- to meet the evolving needs of their expanding user base
- to ensure continued efficiency and innovation in their operations.
Read more about the elkemVerse in our case study.
The elkemVerse is truly a universe of its own in some regards as it comprises not only customised and interlinked R packages, but also a set of workflow/pipeline templates that bring together a variety of tech stacks, all with the aim of meeting their needs now and well into the future.
Business Value Of A Package Universe
Creating a universe of packages offers several key business benefits:`
- Enhanced Scalability: As your team and projects grow, a modular approach ensures that new packages can be added without disrupting existing functionality.
- Improved Efficiency: By standardizing processes and using a common set of tools, development time is reduced, leading to faster project completion.
- Cost Avoidance: Maintaining an organized and well-integrated set of packages reduces the long-term costs associated with technical debt and system maintenance.
- Future-Proofing: Ensuring compatibility with current and future technologies helps in staying ahead of industry trends and demands.
How To Begin Building Your Own Universe
What are the considerations for building your own universe? Let’s take a closer look at how to begin…
Streamlining Development
While it is best practice to write clean code and keep to efficient software development principles, in the real world, these things often fall to the side due to various causes, resulting in considerable technical debt accrued over the course of an application’s life cycle. This in turn leads to slower feature additions and convoluted development processes that introduces inefficiencies all around.
Having a package universe actually encourages developing packages in a modular and structured way, leading to the creation of future packages with the same blueprint. In essence, each package has just one purpose, and is used anywhere and everywhere that such a functionality is needed.
To put it simply, starting a universe is as easy as having a template that has the same features, which include (but are not limited to):
- App structure
- Standardised data connection/manipulation
The simplest building block is utilising the tidyverse for what it does best (conventions, data manipulation, and connection features), then further building up from there with something like Rhino.
Having a common app structure in Rhino ensures that you have a standard way of creating apps that will have the capability to be both robust and production-ready.
We start with the tidyverse as a base to manipulate data in R (dplyr package) as well as accessing data from databases (dbplyr package), then integrate these into a common app structure (rhino). This way, everyone using this universe of packages shares the same standard of development, effectively streamlining all aspects of a project’s life cycle.
Now that we have a starting point for our own “universe,” we can begin our exploration and expansion phase…
Enhanced Scalability and Integration
A universe of streamlined packages, in principle, alleviates dependency issues and ensures compatibility with current and future features. When we are confident in this structure provided by our universe, we can focus on the integration of features to scale up our application without worry.
Mileage may vary in the real world, but with packages and other dependencies taken care of by a universe’s rules and laws, scaling up is much easier dealt with.
We can see this in the larger Pharmaverse of packages, linking various organisations and the packages that they work on.
An example of a recent development in the pharmaceutical space, Roche and Genentech are transitioning to an open-source backbone for clinical trials, leveraging R for data analysis, and aiming for a language-agnostic framework in the future. They are using pharmaverse packages like Admiral for ADaM dataset creation, and Teal for building reproducible Shiny apps, integrating open-source formats and tools like parquet, git, and Docker.
We hosted a Shiny Gathering webinar with one of the maintainers of Admiral. Here’s a summary of that session (and the video replay).
The confidence that these pharmaverse packages provide allows them to focus purely on the integration and scalability aspects of the technologies that can lead them to their goals.
Now, let’s scale down a bit from there and take a look closer at home. What other benefits can having our own universe of packages bring us?
The Migration Conundrum
Migration of legacy systems is a necessary evil that entities will encounter at least once in their lifetime. It is often messy, tedious and oftentimes things awry quickly.
These are some of the reasons why companies like Apple choose to introduce breaking features with every other OS iteration, effectively eliminating the need to support legacy products.
Not all organisations have the privilege to do so, especially when considering some core aspects of the business that started off and currently depend on running legacy systems.
One of the perks of having a shared universe of packages is the ease of migration of previous applications to be compatible with current and future changes in packages. Once we have a streamlined development process and enhanced scalability provided by a package universe, we gain:
- reduced impact on operational stability during migrations
- enhanced security through automated tests
Having a framework like Rhino introduces an opinionated approach to solving, or at least alleviating this by having Continuous Integration/Continuous Deployment (CI/CD) and automated testing. For this very reason, it is no surprise that Rhino is part of the larger Pharmaverse suite of packages, as well as an integral part of the aforementioned elkemVerse.
First time bulding a Rhino app? Check out our tutorial to help you get started.
How exactly does Rhino achieve this? Automated test features (out-of-the-box unit test and end-to-end testing) give you the peace of mind to ensure that all apps running are thoroughly tested, reporting any discrepancies whenever encountered.
Detailed logging features are also readily available to ensure that any point of failure is quickly identified and reported, reducing the debugging time to fix and move forward during the migration process.
Is the “Universe” Approach for You?
If you have a diverse range of packages that you need to maintain in order for you to scale up your processes, the “-verse” approach should be your next. Otherwise, considering such an approach is always a welcome addition to future-proof the processes in your organisation.
Having said that, sometimes things are not as obvious when considering these strategic decisions, and you might end up spending more than your expected share of time and resources just to find out that this approach is really not what you need…
Need a helping hand? Talk to us, and let’s figure out your next steps together!
Wondering what to expect when you work with us? We’ve detailed how we deliver impactful projects in this blog post.
The post appeared first on appsilon.com/blog/.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.