Working with Clinical Trial Data? There’s a Pharmaverse Package for That

Gift Kenneth

23 hours ago

[This article was first published on pharmaverse blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

< !--------------- typical setup -----------------> < !--------------- post begins here ----------------->

Working with clinical trial data is no small task. It needs to be precise, compliant, and efficient. Traditionally, this meant using proprietary tools and working within siloed systems, which often made the process more complicated and expensive than necessary. But we think there’s a better way.

The pharmaverse is an open-source ecosystem of R packages built specifically for clinical trials. These tools integrate seamlessly with the Tidyverse, making data management more flexible, efficient, and transparent.

Whether you’re collecting, validating, analyzing, or preparing data for regulatory submission, there’s a pharmaverse package designed to support your workflow and help you work smarter.

This post covers:

Key stages of clinical trials and the R packages that support them
Creating ADSL datasets and essential programming steps
Key players in pharmaverse and whether you need all packages
How pharmaverse compares to Tidyverse and how to learn it

By the end, you’ll have a clear understanding of how pharmaverse supports clinical trial operations and how to apply these tools in your work.

< section id="key-stages-of-clinical-reporting" class="level2">

Key Stages of Clinical Reporting

Managing clinical trial data involves multiple stages, each with its own challenges. Pharmaverse provides a range of R packages that support different parts of the process, sometimes even offering multiple options for the same task. This flexibility allows organizations to choose the best tools for their specific needs rather than sticking to a one-size-fits-all approach.

A metadata-driven approach helps ensure that clinical trial data is consistently structured and aligned with regulatory standards. The typical process follows this sequence:

Metadata ➝ OAK ➝ Admiral ➝ Define.xml ➝ TLGs ➝ Submissions

Some examples of pharmaverse packages that support clinical reporting include:

{diffdf} – Tracking differences in datasets.
{metatools} – Metadata management and transformation.
{sdtm.oak} – The primary pharmaverse package for SDTM dataset creation.
{datacutr} – Performing data cuts.
{admiral} – Standardized data derivations.
{metacore} – Metadata-driven structures.
The pharmaverse provides multiple table-making packages, such as {chevron} (which builds on {rtables}), {Tplyr}, {pharmaRTF}, {gtsummary}, {cards}, {tfrmt}, and {tidytlg}. More tools are listed on the TLGs page.
{xportr} – CDISC-compliant dataset export.
{pkglite} – Package management and tracking.
{metacore} and {metatools} – For standardized metadata structures and validation.
{logrx} – For logging R scripts.

Pharmaverse packages are built on top of Tidyverse tools and integrate seamlessly with packages like {dplyr} for data manipulation and {ggplot2} for visualization.

Note: This post highlights some key pharmaverse packages relevant to clinical reporting. For a full and up-to-date list, visit the Pharmaverse website. If there’s a package we missed that should be included, let us know, and we’d be happy to update this post.

By using these tools, organizations can optimize their data pipeline, ensuring clinical data is well-structured and ready for regulatory submission with ease.

< section id="example-creating-adsl" class="level2">

Example: Creating ADSL

Building an ADSL dataset involves several key steps, from reading in data to deriving treatment variables and population flags. While these steps apply regardless of the tools used, pharmaverse packages like {admiral} simplify the process with functions designed for CDISC-compliant datasets.

This example is based on the ADSL template, which provides a structured approach to creating an ADSL dataset.

< section id="step-1-read-in-data" class="level4">

Step 1: Read in Data

To begin, clinical trial datasets such as DM, EX, DS, AE, and LB are loaded. The {pharmaversesdtm} package provides sample CDISC SDTM datasets:

library(admiral)
library(dplyr, warn.conflicts = FALSE)
library(pharmaversesdtm)
library(stringr)

# Load sample data
data("dm", package = "pharmaversesdtm")
data("ex", package = "pharmaversesdtm")
data("ds", package = "pharmaversesdtm")

ADSL is typically built from the DM dataset, removing unnecessary columns and adding treatment variables in one step:

adsl <- dm %>%
  select(-DOMAIN) %>%
  mutate(
    TRT01P = ARM,
    TRT01A = ACTARM
  )

< section id="step-2-derive-treatment-variables" class="level4">

Step 2: Derive Treatment Variables

Using {admiral}, we extract and standardize treatment dates from the EX dataset:

ex_ext <- ex %>%
  filter(!is.na(USUBJID)) %>%
  derive_vars_dt(
    dtc = EXSTDTC,
    new_vars_prefix = "EXST"
  ) %>%
  derive_vars_dt(
    dtc = EXENDTC,
    new_vars_prefix = "EXEN"
  )

Then merge these dates into ADSL:

adsl <- adsl %>%
  derive_vars_merged(
    dataset_add = ex_ext,
    filter_add = (EXDOSE > 0 |
      (EXDOSE == 0 &
        str_detect(EXTRT, "PLACEBO"))) & !is.na(EXSTDT),
    new_vars = exprs(TRTSDT = EXSTDT),
    order = exprs(EXSTDT, EXSEQ),
    mode = "first",
    by_vars = exprs(STUDYID, USUBJID)
  ) %>%
  derive_vars_merged(
    dataset_add = ex_ext,
    filter_add = (EXDOSE > 0 |
      (EXDOSE == 0 &
        str_detect(EXTRT, "PLACEBO"))) & !is.na(EXENDT),
    new_vars = exprs(TRTEDT = EXENDT),
    order = exprs(EXENDT, EXSEQ),
    mode = "last",
    by_vars = exprs(STUDYID, USUBJID)
  )

< section id="step-3-derive-end-of-study-eos-status" class="level4">

Step 3: Derive End of Study (EOS) Status

The disposition dataset (DS) is used to determine when a patient exited the study:

ds_ext <- ds %>%
  filter(!is.na(DSSTDTC)) %>%
  derive_vars_dt(
    dtc = DSSTDTC,
    new_vars_prefix = "DSST"
  )

adsl <- adsl %>%
  derive_vars_merged(
    dataset_add = ds_ext,
    by_vars = exprs(STUDYID, USUBJID),
    new_vars = exprs(EOSDT = DSSTDT),
    filter_add = DSCAT == "DISPOSITION EVENT" & DSDECOD != "SCREEN FAILURE"
  )

< section id="step-4-assign-population-flags" class="level4">

Step 4: Assign Population Flags

For safety population (SAFFL), we check if the patient received a treatment dose:

adsl <- adsl %>%
  derive_var_merged_exist_flag(
    dataset_add = ex,
    by_vars = exprs(STUDYID, USUBJID),
    new_var = SAFFL,
    condition = EXDOSE > 0 | str_detect(EXTRT, "PLACEBO")
  )

< section id="step-5-generate-and-save-results" class="level4">

Step 5: Generate and Save Results

Finally, we save the dataset CSV and can view some of its columns:

# Save to a CSV file
write.csv(adsl, "adsl_output.csv", row.names = FALSE)

adsl

USUBJID	TRT01P	TRT01A	TRTSDT	TRTEDT	SAFFL
01-701-1015	Placebo	Placebo	2014-01-02	2014-07-02	Y
01-701-1023	Placebo	Placebo	2012-08-05	2012-09-01	Y
01-701-1028	Xanomeline High Dose	Xanomeline High Dose	2013-07-19	2014-01-14	Y
01-701-1033	Xanomeline Low Dose	Xanomeline Low Dose	2014-03-18	2014-03-31	Y
01-701-1034	Xanomeline High Dose	Xanomeline High Dose	2014-07-01	2014-12-30	Y
01-701-1047	Placebo	Placebo	2013-02-12	2013-03-09	Y
01-701-1057	Screen Failure	Screen Failure	NA	NA	NA
01-701-1097	Xanomeline Low Dose	Xanomeline Low Dose	2014-01-01	2014-07-09	Y
01-701-1111	Xanomeline Low Dose	Xanomeline Low Dose	2012-09-07	2012-09-16	Y
01-701-1115	Xanomeline Low Dose	Xanomeline Low Dose	2012-11-30	2013-01-23	Y

< section id="more-details-on-adsl-creation" class="level4">

More Details on ADSL Creation

This is just a high-level example; the full process includes deriving death variables, grouping populations, and applying labels. For a deeper dive, check out the ADSL Implementation Guide.

< section id="who-are-the-key-players-in-pharmaverse-and-do-you-need-to-use-all-packages" class="level2">

Who Are the Key Players in Pharmaverse, and Do You Need to Use All Packages?

< section id="key-players-in-pharmaverse" class="level3">

Key Players in pharmaverse

Pharmaverse Council and Community – A collaborative group of developers, industry experts, and contributors maintaining and expanding the ecosystem.
Open-Source Contributors – Individuals and organizations developing and refining pharmaverse packages.
Pharmaverse is part of PHUSE – PHUSE plays an active role in supporting and advancing the pharmaverse initiative.
The pharmaverse community collaborates with organizations like the FDA, EMA, R Consortium, and CDISC to align with industry standards and best practices for clinical data reporting.

< section id="do-you-need-to-use-all-pharmaverse-packages" class="level3">

Do You Need to Use All Pharmaverse Packages?

No, organizations can select only the packages that fit their needs.
Many packages are modular and independent, allowing selective integration.
Pharmaverse hosts multiple packages with similar aims, giving users the flexibility to choose what works best for them rather than prescribing a single approach.
Pharmaverse complements Tidyverse, allowing organizations to continue using familiar R workflows.

< section id="how-pharmaverse-differs-from-tidyverse-how-to-learn-it-effectively" class="level2">

How Pharmaverse Differs from Tidyverse & How to Learn It Effectively

< section id="differences-between-pharmaverse-and-tidyverse" class="level4">

Differences Between pharmaverse and Tidyverse

Tidyverse provides general-purpose data science tools such as data wrangling and visualization…
… Whereas pharmaverse integrates Tidyverse functions but adds compliance, validation, and reporting features for pharma-specific clinical data structuring, reporting and regulatory submissions.

< section id="getting-started-with-the-pharmaverse" class="level2">

Getting Started with the Pharmaverse

Pharmaverse provides an open-source ecosystem for clinical reporting, extending Tidyverse with validation, compliance, and regulatory submission capabilities. By following a structured approach from raw data to ADaMs, organizations can enhance efficiency while maintaining data integrity.

You can start with Pharmaverse Examples – A curated set of documentation and tutorials.
Attend Pharma Industry Webinars and Conferences – Stay updated on new developments through events like PHUSE events and webinars, R/Pharma conferences and events, CDISC events, Shiny Gatherings x Pharmaverse webinars, etc.
Engage with the Open-Source Community – Contribute to package improvements or discussions. You can join the pharmaverse community to get started..
Explore packages on the pharmaverse website.
Try implementing an ADSL dataset using following the ADSL Implementation Guide.
Refer to this grid for guidance on using Tidyverse or pharmaverse to complete tasks in the submission process.

< section id="resources" class="level3">

Resources

This blog post was based on this presentation by Sunil Gupta: R and pharmaverse: The New Frontier for Today’s Statistical Programmers
R-Guru Resource Hub for Rapid R Learning
Explore more posts in the pharmaverse blog
Subscribe to the pharmaverse newsletter

< section class="quarto-appendix-contents" id="quarto-reuse">

Reuse

CC BY 4.0

< section class="quarto-appendix-contents" id="quarto-citation">

Citation

BibTeX citation:

@online{kenneth2025,
  author = {Kenneth, Gift and Gupta, Sunil and , APPSILON},
  title = {Working with {Clinical} {Trial} {Data?} {There’s} a
    {Pharmaverse} {Package} for {That}},
  date = {2025-02-28},
  url = {https://pharmaverse.github.io/blog/posts/2025-02-28_theres_a_pharmaverse_package_for_that/managing-clinical-trial-data.html},
  langid = {en}
}

For attribution, please cite this work as:

Kenneth, Gift, Sunil Gupta, and APPSILON. 2025. “Working with Clinical Trial Data? There’s a Pharmaverse Package for That.” February 28, 2025. https://pharmaverse.github.io/blog/posts/2025-02-28_theres_a_pharmaverse_package_for_that/managing-clinical-trial-data.html.

To leave a comment for the author, please follow the link and comment on their blog: pharmaverse blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.