R Programming and Pharmaceutical Data Analysis (Packages for Clinical Trial Data)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In the clinical trials reporting industry, there is an incorrect assumption that SAS software is ideal as regulatory agencies “require” it. Regulatory agencies generally do not mandate the use of specific software for clinical trials reporting. They primarily focus on the accuracy, integrity, and compliance of the reported data.
Recently, other software options, such as the open-source R language, have gained attention across the life sciences, despite facing resistance due to this misconception. R has a strong history of use in academia for statistical research and is being utilized in the pharmaceutical industry. However, its adoption in regulatory submissions has been limited.
In recent years, developers from the pharmaceutical industry have taken advantage of R and collaboratively developed open-source libraries that can be used in clinical trials data analysis and reporting. This blog article/post will provide an overview of a few of such R packages, including their key features and how they can be used, along with resources for learning more about them.
Open source and proprietary works
Please note, this is not legal advice! Explore Posit’s responses to related questions on:
- Commercial use of R and Shiny
- AGPL, GPL, and the use of Shiny applications in proprietary environments
One of the common misconceptions about using open-source packages in software development is that it requires companies to publicly disclose their own code and proprietary workflows. This misconception often arises from the fact that many popular open-source packages, such as R and its various libraries, are licensed under the GNU General Public License (GPL) or other similar licenses that require the distribution of source code.
However, it is important to note that using open-source packages does not necessarily require companies to disclose their own code or proprietary workflows. The GPL and other similar licenses only require the distribution of source code if the software that uses the open-source packages is also distributed.
For example, if a company is developing an internal application that uses R and various open-source libraries, and that application is only used within the company, then there is no requirement to distribute the source code for that application. The company can keep its code and workflows confidential, even if it uses open-source packages.
On the other hand, if a company develops a commercial application that incorporates open-source packages and then distributes that application to customers, then it may be required to distribute the source code for the application, including any modifications made to the open-source packages. However, this requirement can often be satisfied by providing access to the source code through a written offer, rather than including it with the distributed software.
There is an alternative interpretation where your code, which utilizes external R libraries, is not compiled and loads the libraries into memory when the app is executed. Therefore, the library mentioned does not refer specifically to a given library from CRAN; it can be any library from your local drive with a different license. As a result, your code allows for the integration of any library that adheres to the given interface and does not rely on any specific GPL library. This indicates that your app is a separate component that uses other packages but is not linked as one product, eliminating the requirement to open source your code.
Understanding Clinical Data Standards
CDISC (Clinical Data Interchange Standards Consortium) is a global, non-profit organization that develops and promotes data standards to support the acquisition, exchange, submission, and archive of clinical research data and metadata.
CDISC Foundational Standards provide a comprehensive set of data standards that improve the quality, efficiency, and cost-effectiveness of clinical research. These standards cover various aspects of clinical research, from study design to data collection, management, analysis, and reporting.
The standards include the
- Protocol Representation Model (PRM)
- Standard for Exchange of Nonclinical Data (SEND)
- Clinical Data Acquisition Standards Harmonization (CDASH)
- Study Data Tabulation Model (SDTM)
- Study Data Tabulation Model Implementation Guide (SDTMIG)
- Analysis Data Model (ADaM).
These standards help ensure that clinical trial data is organized and analyzed consistently and accurately, enhancing the efficiency and quality of clinical research. The QRS supplements provide a standardized way to assess clinical concepts or task-based observations.
ADaM is required by both the FDA (US) and PMDA (Japan) for new drug applications, while SEND is required by the FDA for nonclinical studies. CDISC standards improve transparency and traceability, making it easier for regulators and others to conduct data review.
Insights Engineering
{teal}: Interactive Exploratory Data Analysis with Shiny Web-Applications
{teal} is a framework for interactive exploratory data analysis that uses Shiny web-applications. {teal} applications require specifying data, including CDISC data, independent datasets, related datasets, and MultiAssayExperiment objects.
The framework also provides modules for performing analysis, such as outlier exploration and data visualization. {teal} modules are built within the framework and can be found in packages like {teal.modules.general}, {teal.modules.clinical} and {teal.modules.{hermes} }.
The functionality of the framework is derived from packages like {teal.data}, {teal.widgets}, {teal.slice}, {teal.code}, {teal.transform}, {teal.logger} and {teal.reporter}. There is also a package called {teal.osprey} that takes community teal modules. Users can refer to these packages for more information on how to use different parts of the {teal} framework.
{hermes} is a tool that helps with preprocessing, analysis, and reporting of RNA-seq data. It has the ability to import RNAseq count data and annotate gene information from a central database like BioMart. It also adds quality control flags to genes and samples, filters the data set, and normalizes counts.
{hermes} can work with data structures from bioconductor packages, thereby allowing interoperability. It can also quickly generate descriptive plots, perform principal components analysis, and produce a QC report based on a template. Additionally, it can perform differential expression analysis.
{tern}
The R package called {tern} offers various analysis functions for generating tables and graphs commonly used in clinical trial reporting. This package provides a wide range of functionalities including data visualizations such as forest plots, line plots, Kaplan-Meier plots, as well as statistical model fits like logistic regression and Cox regression.
Additionally, {tern} allows for the creation of summary tables containing information about unique patients, exposure across patients, and changes from baseline for parameters. Furthermore, {tern} outputs can be added to {teal} applications for interactive exploration of data through modules available in the {teal.modules.clinical} package.
Reference Based Multiple Imputation {rbmi}
The R package called {rbmi} is designed for imputing missing data in clinical trials with continuous multivariate normal longitudinal outcomes. It can handle missing data under a missing at random (MAR) assumption, reference-based imputation methods, and delta adjustments for sensitivity analysis like tipping point analyses.
The package offers both Bayesian and approximate Bayesian multiple imputation, which is combined with Rubin’s rules for inference, as well as frequentist conditional mean imputation with jackknife or bootstrap resampling.
Pharmaverse
Pharmaverse is a network of pharmaceutical industry professionals working collaboratively to create a curated and opinionated subset of open-source software packages and codebases based on the R language.
The objective is to deliver a complete clinical data pipeline from data collection to regulatory submission that is more efficient and sustainable through shared development and maintenance efforts, with a focus on reducing duplication of efforts and gaining increased harmonization across the industry. The initiative aims to attract the next generation of software developers and data scientists to the industry and provide increased transparency.
The scope of pharmaverse is the journey from Case Report Form (CRF) through to submission for clinical trial analysis reporting via R packages, with three categories of R packages recommended:
- External to pharma (transcends specifically pharma needs).
- Pharma-specific independent of pharmaverse (created for use in pharma, but not necessarily following the pharmaverse charter and recommendations).
- Pharma-specific under pharmaverse (created for use in pharma, following the pharmaverse charter and recommendations).
The aim is not to agree on cross-industry implementations of CDISC standards, but rather to act as a starting point for code reuse that is standard agnostic and future-proof. The design and architecture of the package allow companies to use them to adapt to internal workflows that are proprietary. For example, {admiral} package has {admiral.vaccines} and {admiral.ophta}, but there is also an {admiral.roche} that is internal to Roche and not shared (with proprietary license).
Pharmaverse packages will be easily locatable and accessible via a single site, with clear use cases for clinical trial reporting and sharing of differences and unique merits to help companies or users choose which packages or tools to adopt.
Pharmaverse End-to-End Clinical Reporting Packages
Let us explore some of the End-to-End Clinical Reporting Packages from Pharmaverse. The following compilation comprises ‘some’ of the open-source R packages that are relevant to end-to-end clinical reporting in the pharmaceutical industry. The pharmaverse council aims to organize and curate these packages into a well-defined stack in due course.
Conclusion
The life sciences industry is expanding at an unprecedented rate, and open-source initiatives have provided a significant impetus to innovation. With the advent of package developments with programming languages like R, conducting clinical trials, monitoring and analyzing data has become more streamlined and precise.
Collaborative projects in the life sciences domain, spearheaded by pharmaceutical companies and experts, have propelled advancements in research, enabling more rapid and precise discoveries. These open-source projects, which leverage cutting-edge technologies and techniques, are expected to pave the way for accelerated progress in the life sciences industry, leading to better healthcare outcomes for people worldwide.
Conclusion: R Packages for Clinical Trial Data
The life sciences industry is expanding rapidly, and open-source initiatives have provided a significant impetus to innovation. With the advent of package developments with programming languages like R, conducting clinical trials, monitoring and analyzing data has become more streamlined and precise.
Collaborative projects in the life sciences domain, spearheaded by pharmaceutical companies and experts, have propelled advancements in research, enabling more rapid and precise discoveries. These open-source projects, which leverage cutting-edge technologies and techniques, are expected to pave the way for accelerated progress in the life sciences industry, leading to better healthcare outcomes for people worldwide.
The post appeared first on appsilon.com/blog/.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.