Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
One of R’s biggest pitfalls is that eats up memory without letting it go. This can be a huge problem if you are running really big jobs, have a lot of tasks to run, or there are multiple users on your local computer or r server. When I run huge jobs on my mac, I can pretty much forget doing anything else like watching a movie or ram intensive gaming. For my work, Kwelia, I run a few servers with a couple dedicated solely to R jobs with multiple users, and I really don’t want to up the size of the server just for the few times that memory is exhausted by multiple large jobs or all users on at the same time. To solve this problem, I borrowed a tool, crontab, from the linux (we use an ubuntu server but works on my mac as well) folks to schedule my Rscripts to run at off hours (between 2am-8am), and the result is that I can almost cut the size of the server in half.
Installing Crontabs is easy (I used this tutorial and this video) in a linux environment but should be similar for mac and windows. From the command line enter the following to install:
sudo apt-get install gnome-schedule
Then to create a new task for any user on the system enter if you are the root user or admin:
sudo crontab -e
or as a specific user:
crontab -u yourusername -e
You must then choose your preferred text editor. I chose nano, but the vim works just as well. This will create a file that looks like this:
The cron job is laid out in this format:minute (0-59), hour (0-23, 0 = midnight), day (1-31), month (1-12), weekday (0-6, 0 = Sunday), command. To run an rscript in the command just put the “Rscript” and then the file path name. An example:
0 0 * * * Rscript Dropbox/rstudio/dbcode/loop/loop.R
This runs the loop.R file at midnight (zero minute of the zero hour) every day of every week of every month because the stars mean all. I have run endless repeat loops before in previous posts, but R consumes the memory and never free it. However, running cron jobs is like opening and closing R every time so the memory is freed (probably not totally) after the job is done.
As an example, I ran the same job in a repeat every twelve hours on the left side of the black vertical line, and on the right is the same job being called at 8pm and 8am. Here’s the memory usage as seen through munin:
I don’t have to worry nearly as much about my server overloading now, and I could actually downsize the server.
QED
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.