Repost: R 4.5.0 and Bioconductor 3.21

[This article was first published on Getting Genetics Done, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Reposted from the original at https://blog.stephenturner.us/p/r-450-bioconductor-321.

Faster package installation, import only the functions you want with use(), built-in Palmer penguins data, grep values shortcut, and lots of new bioinformatics packages in Bioconductor


R 4.5.0 was released last week, and Bioconductor 3.21 came a few days later. You can read the R release notes here and the Bioconductor 3.21 announcement here. Here I’ll highlight a few things that are worth upgrading for.

R 4.5.0

You can read the R release notes here or just use news().

Faster install.packages()

Any time the minor version changes (e.g., 4.4.x to 4.5.x) you’ll need to reinstall all your packages again. Which will now be much faster thanks to under-the-hood changes to install.packages() that downloads packages in parallel:

install.packages() and download.packages() download packages simultaneously using libcurl, significantly reducing download times when installing or downloading multiple packages.

I don’t have any benchmarks to show but I can tell you reinstalling all my go-to packages was much faster than with previous upgrades and reinstalls.

I keep my own “verse” package on GitHub: github.com/stephenturner/Tverse. This “package” is just an empty package with a DESCRIPTION listing all the packages I use most frequently as dependencies (tidyverse, janitor, pkgdown, usethis, breakerofchains, knitr, here, etc). I’ll first install.packages("devtools"), then devtools::install_github("stephenturner/Tverse"), and all the packages I use most frequently are installed because this dummy “verse” package depends on them. You never want to load this library, but it’s an easy way to reinstall all the R package you use frequently on any new machine or with an R upgrade.

Built-in penguins data

The built-in iris dataset leaves something to be desired. It’s overused and not very engaging for learners or audiences who’ve seen it repeatedly. It’s small (150 rows), clean (no missing data), and doesn’t present challenges to deal with like outliers and multiple categorical labels. And, if you look at the help for ?iris, you’ll see it was published by Ronald Fisher in the Annals of Eugenics, not something I’d care to cite.

Back in 2020, Allison Horst published the palmerpenguins package — a data package meant to provide a great dataset for data exploration and visualization as an alternative to iris. Previously you had to install the package to use the data, but now the penguins data is built in, as is the less clean penguins_raw data that can be used for teaching data cleaning and manipulation.

library(ggplot2)
ggplot(penguins, aes(bill_len, bill_dep)) + 
  geom_point(aes(col=species, size=body_mass, shape=island)) + 
  theme_classic()

And if you’ve written tutorials using the palmerpenguins package, know that the column names differ between the palmerpenguins and base R versions. You might check out the new basepenguins package (documentation). This package gives you functions to convert all your code over to use the base version.

Import only what you want with use()

If you’re coming from Python you’re used to being able to only import what you need from installed packages. E.g.:

from numpy import array, mean
from pathlib import Path

With R if you want to call a function from a package without loading the entire namespace you can always use the double-colon, e.g. dplyr::select(). With R 4.5.0 you can import only the functions you want from any particular package.

use("dplyr", c("filter", "count"))
penguins |>
  filter(island=="Torgersen" & year==2007) |> 
  count(sex)
     sex n
1 female 8
2   male 7
3   <NA> 5

There’s a gotcha here though. As the documentation states, use() is a simple wrapper around library which always uses attach.required=FALSE. That means once you call use() the first time, you need to import all of the functions that you might want to import. If you call use() again or even library(), you still won’t have access to those other functions.

In reality I think I’ll still use the package::function() syntax when I don’t want to load the full library, but it’s nice to have this option. See also the box and conflicted packages.

## This works
use("dplyr", c("filter", "count"))
penguins |>
  filter(island=="Torgersen" & year==2007) |> 
  count(sex)

## This fails!
library(dplyr)
penguins |>
  mutate(ratio=bill_len/bill_dep)
Error in mutate(penguins, ratio = bill_len/bill_dep) : 
  could not find function "mutate"

grepv(): shortcut for grep(..., value=TRUE)

A very small quality of life improvement. If you use grep(), you get the indices of the vector that match your pattern. To get the actual matched values, you can add value=TRUE, or just use the new grepv() function instead.

x <- c("apple", "banana", "whatsapp")

grep("app", x)
#> [1] 1 3

grepv("app", x)
#> [1] "apple"    "whatsapp"

Bioconductor 3.21

You can read the full Bioconductor 3.21 announcement here. You can install it with:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(version = "3.21")

There are 72 new software packages in this release of Bioconductor, bringing the total 2341 software packages, 432 experiment data packages, 928 annotation packages, 30 workflows and 5 books. This release introduces a wide array of new tools and packages across genomics, transcriptomics, proteomics, metabolomics, and spatial omics analysis. Notable additions include new frameworks for spatial transcriptomics (like clustSIGNALSEraster, and CARDspa), enhanced utilities for analyzing single-cell data (SplineDVmistdandelionR), and cutting-edge methods for integrating and visualizing multi-omics datasets (RFLOMICSpathMEDMetaboDynamics). The release also features robust statistical and machine learning approaches, such as LimROTSCPSM, and XAItest, for improved inference and predictive modeling. Several tools improve visualization, accessibility, and reproducibility, including GUI-based apps (geysermiaDashshinyDSP) and packages focused on optimizing performance or interoperability (e.g., RbowtieCudaReducedExperimentRigraphlib).

You may also want to skim through the release notes to look through the updates to existing packages you already use.

To leave a comment for the author, please follow the link and comment on their blog: Getting Genetics Done.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)