Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
By Nick Eagles
As part of recent LIBD work with spatial gene expression, I recently was recommended the tool Space Ranger, which provides software pipelines walking Visium spatial RNA-seq samples through the steps we ultimately need to explore gene expression coupled with spatial information. In this blog post, I’ll explain how to start using Space Ranger at JHPCE, focusing heavily on the set-up details relevant to this cluster in particular.
What is Space Ranger
In practice, there are a fairly large number of computational steps we’d need to perform to produce spatial information about gene expression for a multiple-sample experiment, given just microscope images and Visium RNA-seq output. To start, we’d want our data in FASTQ format- then we’d have to worry about aligning reads to a reference genome, producing gene counts, normalizing data, and so on. Thankfully, Space Ranger bundles together these steps into three simple utilities. We won’t focus too much on how to use these individual utilities or the various features of Space Ranger, documented in detail here; rather, this blog post will describe how to get Space Ranger up and running at the JHPCE cluster.
Using the spaceranger
module at JHPCE
We make regular use of lmod environment modules at JHPCE, as a means of loading and running software without worrying about user set-up differences, manually modifying your PATH, or other nasty considerations. While some sets of modules are available system-wide (for any user), others are not accessible unless you specifically “use” them. To make LIBD-specific modules like spaceranger
available, you must “use” the set of modules explicitly:
module use /jhpce/shared/jhpce/modulefiles/libd
If you want to avoid typing this every time you want to use an LIBD module, consider the .bashrc
trick described here.
Next, let’s load the spaceranger
module in particular.
module load spaceranger
Note: the above code loads the default version of the spaceranger
module currently available. You can see which versions are available with:
module avail spaceranger # Example output may look like this: ##-------------------------- /jhpce/shared/jhpce/modulefiles/libd --------------------------- ## spaceranger/1.1.0 ## # You may also load a specific version of the module: module load spaceranger/1.1.0
First script
Next, let’s run a test of the Space Ranger software on example data they provide. We will write a bash script to load the spaceranger
module as above, and call the executable. We could easily have qrsh
’d into a compute node and run the few lines of code interactively, but I recommend writing a bash script, which we will qsub
, for a few reasons:
- A script documents the code you have run, allowing others to see and reproduce the work you’ve done.
- When we
qsub
the script, we include arguments regarding memory and other hardware resources, which you otherwise would have to remember or estimate each time you interactively run this or similar code. - Using
qsub
allows long-running code to continue without having to worry about keeping your session running and network-connected. This example won’t take long to run, but Space Ranger on real experiments likely will.
Let’s start by writing the “skeleton” of our script, including only the basic required code before worrying about memory, logging, or other more complicated issues. Note that this will create a directory called “tiny” with the example outputs in the current working directory. I’m opening a new file I’ll call spaceranger_test.sh
, and the contents should like something like this:
# Make LIBD modules available, and load the "spaceranger" module module use /jhpce/shared/jhpce/modulefiles/libd module load spaceranger # Test Space Ranger on already-installed example data spaceranger testrun --id=tiny
If you qsub
this script as-is, it will produce two log files in your home directory, containing verbose and somewhat cryptic errors. We’d prefer a single clearly-named log file written to the same directory as our bash script, and of course to fix the source of the Space Ranger error. In this case, we simply need to provide more memory to fix the main error.
Below, we flesh out spaceranger_test.sh
with arguments to qsub
which will improve logging and provide sufficient memory. These arguments are indicated by lines beginning with #$
.
# Specify memory and other details below. In order: # "-cwd": write the log file to the current working directory # "-o" and "-e": combine 'STDOUT' and 'STDERR' messages into the same log file # "-l mem_free=20G,h_vmem=20G": request 20G of memory free, which may not be exceeded #$ -cwd #$ -o spaceranger_test.txt #$ -e spaceranger_test.txt #$ -l mem_free=20G,h_vmem=20G # Make LIBD modules available, and load the "spaceranger" module module use /jhpce/shared/jhpce/modulefiles/libd module load spaceranger # Test Space Ranger on already-installed example data spaceranger testrun --id=tiny
Now, we can actually submit the script and wait for the job to complete.
qsub spaceranger_test.sh
If you open spaceranger_test.txt
after the job completes, you should see that the test was successful. However, there is a worrying warning suggesting that Space Ranger is not properly made aware of the memory to which it should have access:
Martian Runtime - v4.0.0 2020-10-19 15:48:59 [jobmngr] WARNING: configured to use 453GB of local memory, but only 331.3GB is currently available. 2020-10-19 15:48:59 [jobmngr] WARNING: The current virtual address space size limit is too low. Limiting virtual address space size interferes with the operation of many common libraries and programs, and is not recommended. Contact your system administrator to remove this limit.
Rather than using 20GB of memory, Space Ranger believes it has a whopping 453GB of memory to work with, though only ~331GB are actually free. In the next section we will communicate memory and even CPU constraints to Space Ranger with arguments to the spaceranger
command.
Exploring memory and parallelization options
Below, we will construct another bash script to submit with qsub
, demonstrating how to properly specify memory and number of CPUs for a hypothetical dataset. Suppose we have an experiment with multiple FASTQ files and a microscope slide image. We would like to call the spaceranger count
command on this input data, making use of parallelization for speed. Let’s use 5 CPU cores and a total of 60GB of memory. Following the documentation here, we can create the template script we’ll call SR_count_example.sh
, appropriate for running at JHPCE:
# Specify memory and other details. Note that 'mem_free' and 'h_vmem' specify # per-core memory (12G * 5 cores = 60GB total, as we want), as indicated here: # https://jhpce.jhu.edu/knowledge-base/how-to/#multicore #$ -cwd #$ -o SR_count_example.txt #$ -e SR_count_example.txt #$ -l mem_free=12G,h_vmem=12G #$ -pe local 5 # Make LIBD modules available, and load the "spaceranger" module module use /jhpce/shared/jhpce/modulefiles/libd module load spaceranger # The main Space Ranger command spaceranger count \ --id=<SOME RUN ID HERE> \ --fastqs <LIST OF FASTQ PATHS HERE> \ --image <IMAGE PATH HERE> \ --jobmode=local \ # we will use one "node" of the cluster, which has many cores available --localcores=5 \ # we requested 5 cores at the top --localmem=54 # 60GB * 0.9 = 54GB; using 90% of total memory requested is recommended
In practice, you’d specify an --id
, the FASTQ paths --fastqs
, and the microscope image --image
in the above script, for your experiment. Then simply submit the script as a job!
qsub SR_count_example.sh
Note: you might also be interested in sgejobs that we explored in a LIBD rstats club session. You can use it to create SGE bash
scripts.
Acknowledgments
This blog post was made possible thanks to:
References
[1] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.10. 2019. URL: https://CRAN.R-project.org/package=knitcitations.
[2] G. Csárdi, R. core, H. Wickham, W. Chang, et al. sessioninfo: R Session Information. R package version 1.1.1. 2018. URL: https://CRAN.R-project.org/package=sessioninfo.
[3] Y. Xie, A. P. Hill, and A. Thomas. blogdown: Creating Websites with R Markdown. ISBN 978-0815363729. Boca Raton, Florida: Chapman and Hall/CRC, 2017. URL: https://github.com/rstudio/blogdown.
Reproducibility
## ─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.2 (2020-06-22) ## os macOS Catalina 10.15.7 ## system x86_64, darwin17.0 ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz America/New_York ## date 2020-10-21 ## ## ─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────── ## package * version date lib source ## assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0) ## bibtex 0.4.2.3 2020-09-19 [1] CRAN (R 4.0.2) ## BiocManager 1.30.10 2019-11-16 [1] CRAN (R 4.0.0) ## BiocStyle * 2.17.1 2020-09-24 [1] Bioconductor ## blogdown 0.21.19 2020-10-21 [1] Github (rstudio/blogdown@1a7ad52) ## bookdown 0.21 2020-10-13 [1] CRAN (R 4.0.2) ## cli 2.1.0 2020-10-12 [1] CRAN (R 4.0.2) ## colorout * 1.2-2 2020-05-18 [1] Github (jalvesaq/colorout@726d681) ## crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0) ## digest 0.6.26 2020-10-17 [1] CRAN (R 4.0.2) ## evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0) ## fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0) ## generics 0.0.2 2018-11-29 [1] CRAN (R 4.0.0) ## glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2) ## htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2) ## httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2) ## jsonlite 1.7.1 2020-09-07 [1] CRAN (R 4.0.2) ## knitcitations * 1.0.10 2019-09-15 [1] CRAN (R 4.0.0) ## knitr 1.30 2020-09-22 [1] CRAN (R 4.0.2) ## lubridate 1.7.9 2020-06-08 [1] CRAN (R 4.0.2) ## magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0) ## plyr 1.8.6 2020-03-03 [1] CRAN (R 4.0.0) ## R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0) ## Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.2) ## RefManageR 1.2.12 2019-04-03 [1] CRAN (R 4.0.0) ## rlang 0.4.8 2020-10-08 [1] CRAN (R 4.0.2) ## rmarkdown 2.5 2020-10-21 [1] CRAN (R 4.0.2) ## sessioninfo * 1.1.1 2018-11-05 [1] CRAN (R 4.0.0) ## stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2) ## stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0) ## withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.2) ## xfun 0.18 2020-09-29 [1] CRAN (R 4.0.2) ## xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.0) ## yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0) ## ## [1] /Library/Frameworks/R.framework/Versions/4.0branch/Resources/library
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.