Batch Processing vs. Interactive Sessions
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
We introduced batch processing 3 weeks ago. Many people asked about differences and benefits of batch processing or interactive sessions. Lets start with the definitions:
Batch Processing / Batch Jobs:
Batch processing is the execution of a series of programs or only one task on a computer environment without manual intervention. All data and commands are preselected through scripts or command-line parameters and therefore run to completion without human contact. This is termed as “batch processing” or “batch job” because the input data are collected into batches of files and are processed in batches by the program.
In many cases batch jobs are submitted to a job scheduler and run on the first available compute node(s).
Interactive Session:
Interactive sessions prompt the user for input as data or commands. Typically, in an interactive session there is a software running on a computer environment and accepts input from human. This is the simplest way to work on any system – you simply log on and run whatever commands you need to, whether on the command line or in a graphical environment and you log out when you’ve finished.
Comparison:
- Interactive session are usually used so that a person can become familiar with the computing environment and test their code before attempting a long production run as a batch job.
- Short tasks, those which require frequent user interaction and those which are graphically intensive are the type that are best done interactively.
- Batch jobs are best for longer running processes, parallel processes or for running large numbers of short jobs simultaneously.
- If something can be left running for a significant amount of time without any interaction then it’s almost certainly a batch job.
- Running processes as batch jobs also has the benefit of allowing you to log out of the system and to log out of the PC you are connecting from. You will get an email as soon as your batch job is done.
- Batch processing avoids idling the computing resources with minute-by-minute manual intervention and supervision.
- Batch processing supports reproducibility.
Example:
Much of the time R is used in interactive sessions: a user sits in front of his computer, and types instructions into the command line of the programming language. The instructions are executed, the result is displayed on-screen, and then programming language waits for the next command.
But you can uses R or Python with batch processing, too! The user prepares a sequence of commands in advance as a script file (e.g. input.R) and passes the commands into the programming language without ever waiting for human intervention. For the R language many people prefer this command:
R --vanilla < input.R > result.txt
But there are two commands which are more optimized for batch processing and for example deal correctly with error messages:
R CMD BATCH test.R result.txt
(more details at http://stat.ethz.ch/R-manual/R-patched/library/utils/html/BATCH.html)
Rscript test.R > result.txt
(more details at http://stat.ethz.ch/R-manual/R-patched/library/utils/html/Rscript.html)
And if you want to use your script file in your interactive session you can use the command
source("input.R", echo=TRUE)
in your R console.
If you have not used batch mode ever before you should try it. It will improve your analyses in several steps.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.