My First Few Days with RStudio
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
As most readers are probably aware, the free IDE for R, called RStudio, was recently released for general use and it immediately made huge waves within the R community. IDE stands for Integrated Development Environment. IDEs typically provides a rich set tools developing in some target language. For standard programming languages like C++ (VisualStudio) and Java (Eclipse or NetBeans), IDEs contain:
- an editor tailored to the target language. The editor typically has tab/auto-complete for variable names, functions and class methods and properties and also features syntax highlighting.
- a multiple document interface (MDI) where there may be several documents opened in different tabs.
- a window that interacts with the compiler, or a panel containing the console to the language, a la MATLAB, and even vanilla R’s GUI.
- a debugger
- a file browser and language reference.
RStudio plays to this analogy very well, and makes modifications where appropriate. RStudio provides many features that are lacking in the standard R GUI, and improves on features that do not work properly in the Windows R GUI. Over the past few days, I have been doing all of my R analysis within RStudio, shortly with the Desktop version, and mostly with the Server version. I will discuss mostly the server version since that is what I have been using. It is identical (AFAIK) to the desktop version, so you are not missing anything by using either version.
RStudio Server
First, installation is miraculously easy. I only had a few very minor glitches to deal with. Armed with sudo access to a machine on a research cluster at work, I was able to simply download the RPM and install it using the instructions provided on the web site. Then, all I had to do was fire up a browser and go to
http://servername.com:8787
and I was asked for my login credentials. But I couldn’t get in. This server authenticates using LDAP, but all I had to do was replace the contents of /etc/pam.d/rstudio with the contents or /etc/pam.d/login and I was able to login. But then there was a “unknown error.” Oh, the version of R that was installed was too old (2.8). I just did a yum upgrade R, and RStudio logged me in with no problems. What showed up on my browser screen was beautiful! It looked identical to the desktop version of RStudio.
Once logged in, I somehow have access to ALL of my files on the remote server. I can load my data (typically produced by Hadoop) already residing on the server, and I can save output, graphs, data and even the R session itself on the remote server! All while just clicking buttons. No commands to remember, no screwed up PDF files, and most importantly…. no scping files back and forth from the server just to create a plot (X worked well, but had limitations)!
Things I Love about R Studio
I will have to go panel by panel, but even then I will have missed cool features. I also will not discuss features that are already present in the MacOS X R GUI and are repeated and beautified in RStudio:
The R command prompt still looks the same. At first, my reaction was “Damn, what am I supposed to do?” But when the GUI finished loading, the familiar R command prompt appeared in all is 1970ish glory. I immediately started typing commands and seeing fields in the other panes populate and change to display different usages. It left me with a “oh, I see” feeling.
Saves R sessions correctly, and when I return to RStudio, ALL of my work is there! I could never get the save session/image function to work in R GUI. I gave up several years ago. In RStudio, it works properly, but you don’t even need it because… when you leave RStudio and then return, everything is there! The workspace (variables, functions, data, etc), the scripts you were working on, the plots, even the last dang help screen you looked at!
The Stop Execution button in the console actually works. When executing a long running computation in R GUI (that’s the first mistake), it is sometimes necessary to cancel the computation either because I made an error, or because the computation is killing my system’s performance. In R GUI, particularly on MacOS X, the Stop Execution button did absolutely nothing, because there was typically a spinning beachball preventing me from clicking it. Hitting ESC also did not work. In RStudio, clicking Stop actually seems to break out of the madness.
Clicking on a data frame object in the workspace pane, causes it to be displayed in a nice tabular format. It can also be printed to a local printer, or opened in a new window.
Clicking on a numerical value allows the user to change it by opening an in-place edit box. Clicking on other objects like lists, vectors and functions opens an edit window displaying the definition that created it.
Files panel. There is nothing really exciting to see here, except that by clicking the Upload button, I can upload files directly to the remote server just by selecting the file, without having to SCP!
The “Source on Save” function is interesting. If enabled, RStudio will run/source the script each time the script is saved. Honestly, I do not find this feature to be all that useful unless in the middle of debugging, and dangerous if not debugging. Suppose after a long 10-fold-cross-validation computation there is an error that we want to fix. We fix the error and save the script. Do we really want to run the computation again? If R were a compiled language, then yes. Since R is not a compiled language, this feature is not entirely useful in concept.
The “magic wand” icon contains what I suspect to be a growing collection of coding tools. Currently, the user can comment and uncomment a bunch of lines at once. This is particularly useful since, for some reason, there is no multiline comment flag in R. The user can also select a series of lines and wrap a function around them. This feature could be dangerous for those not familiar with coding but provides a very nice way to put a bunch of code into a function as an afterthought.
Plot panel. By far my favorite part of RStudio is the plot panel! All plots are saved in this panel, and the user move back and forth among plots that were already constructed. The Export button allows exporting a plot to user defined dimensions and save to the local machine as a PNG, or even copy it to the local machine’s clipboard! Of course, the PDF button produces a PDF file of the plot that can be saved on the local machine. If the plots are all too much, we can click “Clear All” and start again with a clean slate.
But, is it possible to create plots of larger size? I am sure there is, but I did not spend much time looking.
LaTeX and Sweave documents. From the File menu the user can create new documents including LaTeX and Sweave. Unfortunately, I cannot experiment more with these features because there is something amiss in my configuration. For students and researchers, having Sweave and LaTeX integrated with RStudio is a huge, huge, huge advantage. No longer must we copy/paste among different programs. To make the integration complete, BibTeX, Asymptote/TikZ/gnuplot whatever should be easily included by the user.
At any point if the user interface shows stale data, there is a Reload button to help you out by refreshing the entire RStudio interface.
Things that Need Improvement
I do not really have any complaints about RStudio, quite the opposite actually. However, there are some things that do not seem to work. I should note however, that I have not spent much (well, any) time debugging them. The developers are probably already working on some of them. Some of them are probably problems in my configuration and others are probably settings that I need to tweak.
No auto-completion of parentheses or quotation marks. This is a bummer, but not a deal breaker. On the other hand, as you type closing marks, RStudio highlights the matching mark.
The dataset view needs work. Columns can’t be resized. Other natural functionalities that seem to be missing are: column renaming (a call to names), cannot sort or order values by a column, and data manipulation (I didn’t say that). These missing features are a tad disappointing, but a hell of a lot better than displaying in the terminal.
Install packages in the packages panel does not work on our server’s configuration.
LaTeX cannot be found. Upon attempting to create a new LaTeX or Sweave document, I got a friendly notice (instead of a bizarre error message) saying that LaTeX is not installed. The problem is, it is installed and there does not seem to be anywhere in the GUI to configure its location. Additionally, some LaTeX templates would be useful.
In Conclusion…
My Workflow Before and After RStudio
After RStudio
The developers of this open source project seemed to get it right on the first try. How the hell is that possible??? So has anyone switched from the big R to the big blue ball?
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.