Site icon R-bloggers

Working with Clinical Trial Data? There’s a Pharmaverse Package for That

[This article was first published on pharmaverse blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< !--------------- typical setup -----------------> < !--------------- post begins here ----------------->

Working with clinical trial data is no small task. It needs to be precise, compliant, and efficient. Traditionally, this meant using proprietary tools and working within siloed systems, which often made the process more complicated and expensive than necessary. But we think there’s a better way.

The pharmaverse is an open-source ecosystem of R packages built specifically for clinical trials. These tools integrate seamlessly with the Tidyverse, making data management more flexible, efficient, and transparent.

Whether you’re collecting, validating, analyzing, or preparing data for regulatory submission, there’s a pharmaverse package designed to support your workflow and help you work smarter.

This post covers:

By the end, you’ll have a clear understanding of how pharmaverse supports clinical trial operations and how to apply these tools in your work.

< section id="key-stages-of-clinical-reporting" class="level2">

Key Stages of Clinical Reporting

Managing clinical trial data involves multiple stages, each with its own challenges. Pharmaverse provides a range of R packages that support different parts of the process, sometimes even offering multiple options for the same task. This flexibility allows organizations to choose the best tools for their specific needs rather than sticking to a one-size-fits-all approach.

A metadata-driven approach helps ensure that clinical trial data is consistently structured and aligned with regulatory standards. The typical process follows this sequence:

MetadataOAKAdmiralDefine.xmlTLGsSubmissions

Some examples of pharmaverse packages that support clinical reporting include:

Pharmaverse packages are built on top of Tidyverse tools and integrate seamlessly with packages like {dplyr} for data manipulation and {ggplot2} for visualization.

Note: This post highlights some key pharmaverse packages relevant to clinical reporting. For a full and up-to-date list, visit the Pharmaverse website. If there’s a package we missed that should be included, let us know, and we’d be happy to update this post.

By using these tools, organizations can optimize their data pipeline, ensuring clinical data is well-structured and ready for regulatory submission with ease.

< section id="example-creating-adsl" class="level2">

Example: Creating ADSL

Building an ADSL dataset involves several key steps, from reading in data to deriving treatment variables and population flags. While these steps apply regardless of the tools used, pharmaverse packages like {admiral} simplify the process with functions designed for CDISC-compliant datasets.

This example is based on the ADSL template, which provides a structured approach to creating an ADSL dataset.

< section id="step-1-read-in-data" class="level4">

Step 1: Read in Data

To begin, clinical trial datasets such as DM, EX, DS, AE, and LB are loaded. The {pharmaversesdtm} package provides sample CDISC SDTM datasets:

library(admiral)
library(dplyr, warn.conflicts = FALSE)
library(pharmaversesdtm)
library(stringr)

# Load sample data
data("dm", package = "pharmaversesdtm")
data("ex", package = "pharmaversesdtm")
data("ds", package = "pharmaversesdtm")

ADSL is typically built from the DM dataset, removing unnecessary columns and adding treatment variables in one step:

adsl <- dm %>%
  select(-DOMAIN) %>%
  mutate(
    TRT01P = ARM,
    TRT01A = ACTARM
  )
< section id="step-2-derive-treatment-variables" class="level4">

Step 2: Derive Treatment Variables

Using {admiral}, we extract and standardize treatment dates from the EX dataset:

ex_ext <- ex %>%
  filter(!is.na(USUBJID)) %>%
  derive_vars_dt(
    dtc = EXSTDTC,
    new_vars_prefix = "EXST"
  ) %>%
  derive_vars_dt(
    dtc = EXENDTC,
    new_vars_prefix = "EXEN"
  )

Then merge these dates into ADSL:

adsl <- adsl %>%
  derive_vars_merged(
    dataset_add = ex_ext,
    filter_add = (EXDOSE > 0 |
      (EXDOSE == 0 &
        str_detect(EXTRT, "PLACEBO"))) & !is.na(EXSTDT),
    new_vars = exprs(TRTSDT = EXSTDT),
    order = exprs(EXSTDT, EXSEQ),
    mode = "first",
    by_vars = exprs(STUDYID, USUBJID)
  ) %>%
  derive_vars_merged(
    dataset_add = ex_ext,
    filter_add = (EXDOSE > 0 |
      (EXDOSE == 0 &
        str_detect(EXTRT, "PLACEBO"))) & !is.na(EXENDT),
    new_vars = exprs(TRTEDT = EXENDT),
    order = exprs(EXENDT, EXSEQ),
    mode = "last",
    by_vars = exprs(STUDYID, USUBJID)
  )
< section id="step-3-derive-end-of-study-eos-status" class="level4">

Step 3: Derive End of Study (EOS) Status

The disposition dataset (DS) is used to determine when a patient exited the study:

ds_ext <- ds %>%
  filter(!is.na(DSSTDTC)) %>%
  derive_vars_dt(
    dtc = DSSTDTC,
    new_vars_prefix = "DSST"
  )

adsl <- adsl %>%
  derive_vars_merged(
    dataset_add = ds_ext,
    by_vars = exprs(STUDYID, USUBJID),
    new_vars = exprs(EOSDT = DSSTDT),
    filter_add = DSCAT == "DISPOSITION EVENT" & DSDECOD != "SCREEN FAILURE"
  )
< section id="step-4-assign-population-flags" class="level4">

Step 4: Assign Population Flags

For safety population (SAFFL), we check if the patient received a treatment dose:

adsl <- adsl %>%
  derive_var_merged_exist_flag(
    dataset_add = ex,
    by_vars = exprs(STUDYID, USUBJID),
    new_var = SAFFL,
    condition = EXDOSE > 0 | str_detect(EXTRT, "PLACEBO")
  )
< section id="step-5-generate-and-save-results" class="level4">

Step 5: Generate and Save Results

Finally, we save the dataset CSV and can view some of its columns:

# Save to a CSV file
write.csv(adsl, "adsl_output.csv", row.names = FALSE)

adsl
USUBJID TRT01P TRT01A TRTSDT TRTEDT SAFFL
01-701-1015 Placebo Placebo 2014-01-02 2014-07-02 Y
01-701-1023 Placebo Placebo 2012-08-05 2012-09-01 Y
01-701-1028 Xanomeline High Dose Xanomeline High Dose 2013-07-19 2014-01-14 Y
01-701-1033 Xanomeline Low Dose Xanomeline Low Dose 2014-03-18 2014-03-31 Y
01-701-1034 Xanomeline High Dose Xanomeline High Dose 2014-07-01 2014-12-30 Y
01-701-1047 Placebo Placebo 2013-02-12 2013-03-09 Y
01-701-1057 Screen Failure Screen Failure NA NA NA
01-701-1097 Xanomeline Low Dose Xanomeline Low Dose 2014-01-01 2014-07-09 Y
01-701-1111 Xanomeline Low Dose Xanomeline Low Dose 2012-09-07 2012-09-16 Y
01-701-1115 Xanomeline Low Dose Xanomeline Low Dose 2012-11-30 2013-01-23 Y
< section id="more-details-on-adsl-creation" class="level4">

More Details on ADSL Creation

This is just a high-level example; the full process includes deriving death variables, grouping populations, and applying labels. For a deeper dive, check out the ADSL Implementation Guide.

< section id="who-are-the-key-players-in-pharmaverse-and-do-you-need-to-use-all-packages" class="level2">

Who Are the Key Players in Pharmaverse, and Do You Need to Use All Packages?

< section id="key-players-in-pharmaverse" class="level3">

Key Players in pharmaverse

< section id="do-you-need-to-use-all-pharmaverse-packages" class="level3">

Do You Need to Use All Pharmaverse Packages?

< section id="how-pharmaverse-differs-from-tidyverse-how-to-learn-it-effectively" class="level2">

How Pharmaverse Differs from Tidyverse & How to Learn It Effectively

< section id="differences-between-pharmaverse-and-tidyverse" class="level4">

Differences Between pharmaverse and Tidyverse

< section id="getting-started-with-the-pharmaverse" class="level2">

Getting Started with the Pharmaverse

Pharmaverse provides an open-source ecosystem for clinical reporting, extending Tidyverse with validation, compliance, and regulatory submission capabilities. By following a structured approach from raw data to ADaMs, organizations can enhance efficiency while maintaining data integrity.

< section id="resources" class="level3">

Resources

< section class="quarto-appendix-contents" id="quarto-reuse">

Reuse

CC BY 4.0
< section class="quarto-appendix-contents" id="quarto-citation">

Citation

BibTeX citation:
@online{kenneth2025,
  author = {Kenneth, Gift and Gupta, Sunil and , APPSILON},
  title = {Working with {Clinical} {Trial} {Data?} {There’s} a
    {Pharmaverse} {Package} for {That}},
  date = {2025-02-28},
  url = {https://pharmaverse.github.io/blog/posts/2025-02-28_theres_a_pharmaverse_package_for_that/managing-clinical-trial-data.html},
  langid = {en}
}
For attribution, please cite this work as:
Kenneth, Gift, Sunil Gupta, and APPSILON. 2025. “Working with Clinical Trial Data? There’s a Pharmaverse Package for That.” February 28, 2025. https://pharmaverse.github.io/blog/posts/2025-02-28_theres_a_pharmaverse_package_for_that/managing-clinical-trial-data.html.
To leave a comment for the author, please follow the link and comment on their blog: pharmaverse blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version