Sync Your Rprofile Across Multiple R Installations

[This article was first published on Getting Genetics Done, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Your Rprofile is a script that R executes every time you launch an R session. You can use it to automatically load packages, set your working directory, set options, define useful functions, and set up database connections, and run any other code you want every time you start R.

If you’re using R in Linux, it’s a hidden file in your home directory called ~/.Rprofile, and if you’re on Windows, it’s usually in the program files directory: C:\Program Files\R\R-2.12.2\library\base\R\Rprofile. I sync my Rprofile across several machines and operating systems by creating a separate script called called syncprofile.R and storing this in my Dropbox. Then, on each machine, I edit the real Rprofile to source the syncprofile.R script that resides in my Dropbox.

One of the disadvantages of doing this, however, is that all the functions you define and variables you create are sourced into the global environment (.GlobalEnv). This can clutter your workspace, and if you want to start clean using rm(list=ls(all=TRUE)), you’ll have to re-source your syncprofile.R script every time.

It’s easy to get around this problem. Rather than simply appending source(/path/to/dropbox/syncprofile.R) to the end of your actual Rprofile, first create a new environment, source that script into that new environment, and attach that new environment. So you’ll add this to the end of your real Rprofile on each machine/installation:

my.env <- new.env()
sys.source(“C:/Users/st/Dropbox/R/Rprofile.r”, my.env)
attach(my.env)

All the functions and variables you’ve defined are now available but they no longer clutter up the global environment.

If you have code that you only want to run on specific machines, you can still put that into each installation’s Rprofile rather than the syncprofile.R script that you sync using Dropbox. Here’s what my syncprofile.R script looks like – feel free to take whatever looks useful to you.

# To source this file into an environment to avoid cluttering the global workspace, put this in Rprofile:
# my.env <- new.env(); sys.source("C:/PathTo/THIS_FILE.r", my.env); attach(my.env)
#-----------------------------------------------------------------------
# Load packages, set options and cwd, set up database connection
#-----------------------------------------------------------------------
## Load packages
# require(ggplot2) #plotting
# require(plyr) #data manipulation
# require(reshape) #data manipulation
# require(sqldf) #manipulate data frame with SQL
## Sets the working directory to C:/R
setwd("~/R")
## Don't show those silly significanct stars
options(show.signif.stars=FALSE)
## Do you want to automatically convert strings to factor variables in a data.frame?
options(stringsAsFactors=TRUE)
## Hard code the US repository for CRAN so it doesn't ask me every time.
r <- getOption("repos")
r["CRAN"] <- "http://cran.us.r-project.org"
options(repos = r)
rm(r)
## Some SQLite stuff I don't use any more because I switched to MySQL
# require(RSQLite)
# channel <- dbConnect(SQLite(), "C:/cygwin/home/sturner/dbs/sdt.sqlite")
# query <- function(...) dbGetQuery(channel,...)
## Set up ODBC connection for MySQL localhost, and make it easy to query a database with query() function.
require(RODBC) # The rprofile script will fail here if you don't have RODBC installed.
channel <- odbcConnect("localhost")
query <- function(...) sqlQuery(channel, ...)
#-----------------------------------------------------------------------
# Functions
#-----------------------------------------------------------------------
## Transpose a numeric data frame with ID in first column
tdf <- function(d) {
row.names(d) <- d[[1]]
d[[1]] <- NULL
d <- as.data.frame(t(d))
d$id <- row.names(d)
d <- cbind(d[ncol(d)], d[-ncol(d)])
row.names(d) <- NULL
d
}
## Convert selected columns of a data frame to factor variables
factorcols <- function(d, ...) lapply(d, function(x) factor(x, ...))
## Returns a logical vector TRUE for elements of X not in Y
"%nin%" <- function(x, y) !(x %in% y)
## Returns names(df) in single column, numbered matrix format.
n <- function(df) matrix(names(df))
## Single character shortcuts for summary() and head().
s <- base::summary
h <- utils::head
## ht==headtail, i.e., show the first and last 10 items of an object
ht <- function(d) rbind(head(d,10),tail(d,10))
## Show the first 5 rows and first 5 columns of a data frame or matrix
hh <- function(d) d[1:5,1:5]
## Read data on clipboard.
read.cb <- function(...) {
ismac <- Sys.info()[1]=="Darwin"
if (!ismac) read.table(file="clipboard", ...)
else read.table(pipe("pbpaste"), ...)
}
## Open current directory on mac
macopen <- function(...) system("open .")
## name("test.png") results in "C:/R/2010-04-20-test.png" if running this in C:/R on April 20 2010.
name <- function(filename="filename") paste(getwd(),"/",Sys.Date(),"-",filename,sep="")
## Takes a dataframe and a column name, and moves that column to the front of the DF.
moveColFront <- function(d=dataframe, colname="colname") {
index <- match(colname, names(d))
cbind(d[index],d[-index])
}
## Permutes a column in a data.frame, sets seed optionally
permute <- function (dataframe, columnToPermute="column", seed=NULL) {
if (!is.null(seed)) set.seed(seed)
colindex <- which(names(dataframe)==columnToPermute)
permutedcol <- dataframe[ ,colindex][sample(1:nrow(dataframe))]
dataframe[colindex] <- permutedcol
return(dataframe)
}
## Summarize missing data in a data frame. Return a list (lpropmiss) or data frame (propmiss)
lpropmiss <- function(dataframe) lapply(dataframe,function(x) data.frame(nmiss=sum(is.na(x)), n=length(x), propmiss=sum(is.na(x))/length(x)))
propmiss <- function(dataframe) {
m <- sapply(dataframe, function(x) {
data.frame(
nmiss=sum(is.na(x)),
n=length(x),
propmiss=sum(is.na(x))/length(x)
)
})
d <- data.frame(t(m))
d <- sapply(d, unlist)
d <- as.data.frame(d)
d$variable <- row.names(d)
row.names(d) <- NULL
d <- cbind(d[ncol(d)],d[-ncol(d)])
return(d[order(d$propmiss), ])
}
## Make a pretty QQ plot of p-values
qq = function(pvector, ...) {
if (!is.numeric(pvector)) stop("D'oh! P value vector is not numeric.")
pvector <- pvector[!is.na(pvector) & pvector<1 & pvector>0]
o = -log10(sort(pvector,decreasing=F))
#e = -log10( 1:length(o)/length(o) )
e = -log10( ppoints(length(pvector) ))
plot(e,o,pch=19,cex=1, xlab=expression(Expected~~-log[10](italic(p))), ylab=expression(Observed~~-log[10](italic(p))), xlim=c(0,max(e)), ylim=c(0,max(o)), ...)
abline(0,1,col="red")
}
## Draw a histogram with normal overlay (From http://www.statmethods.net/graphs/density.html)
histnormal <- function(d, main=NULL, xlab=NULL, breaks="FD", ...) {
if (any(is.na(d))) warning(paste(sum(is.na(d)), "missing values")); d <- na.omit(d)
h <- hist(d, plot=FALSE, breaks=breaks, ...)
x <- seq(min(d), max(d), length=40)
y <- dnorm(x, mean=mean(d), sd=sd(d))
y <- y*diff(h$mids[1:2])*length(d)
hist(d, col="gray50", main=main, xlab=xlab, ylim=c(0,max(y)), breaks=breaks,...)
lines(x,y, col="blue", lwd=2)
rug(x)
}
## Draw a histogram with density overlay
histdensity <- function(x, main=NULL, breaks="FD", ...) {
if (any(is.na(x))) warning(paste(sum(is.na(x)), "missing values")); x <- na.omit(x)
hist(x, col="gray50", probability=TRUE, breaks=breaks, main=main, ...)
lines(density(x, na.rm = TRUE), col = "blue", lwd=2)
rug(x)
}
## Plot scatterplot with trendline and confidence interval (From http://tinyurl.com/3bvrth7)
scatterci <- function(x, y, ...) {
plot(x, y, ...)
mylm <- lm(y~x)
abline(mylm, col="blue")
x=sort(x)
prd<-predict(mylm,newdata=data.frame(x=x),interval = c("confidence"), level = 0.95)
lines(x,prd[,2],col="blue",lty=3)
lines(x,prd[,3],col="blue",lty=3)
}
## Get the proportion variation explained. See this website for more details: http://goo.gl/jte8X
rsq <- function(predicted, actual) 1-sum((actual-predicted)^2)/sum((actual-mean(actual))^2)
## Correlation matrix with p-values. See http://goo.gl/nahmV for documentation of this function
cor.prob <- function(X, dfr = nrow(X) - 2) {
R <- cor(X)
above <- row(R) < col(R)
r2 <- R[above]^2
Fstat <- r2 * dfr / (1 - r2)
R[above] <- 1 - pf(Fstat, 1, dfr)
R[row(R)==col(R)]<-NA
R
}
## This function accepts a GLM object and does a LR chi-square test on the fit.
lrt <- function (modelobject) {
lrtest.chi2 <- model$null.deviance - model$deviance # Difference in deviance between model with intercept only and full model. This is the likelihood ratio test statistic (-2(log(L))).
lrtest.df <- model$df.null - model$df.residual # Difference in DF. Make sure this equals the number of predictors in the model!
fitpval <- 1-pchisq(lrtest.chi2,lrtest.df)
cat("Likelihood ratio test on model fit:\n\n")
data.frame(lrtest.chi2=lrtest.chi2,lrtest.df=lrtest.df,fitpval=fitpval) #Output gives you the chisquare, df, and p-value.
}
## This function does the same thing as lrtest in the Design package, but doesn't do as much checking.
## Remember, the lrt has to test the same model (model fit on same observations)
## Also the drop1(fullmodel,test="Chisq") does something similar.
lrt2 <- function (full,reduced) {
if (reduced$deviance<=full$deviance) stop ("Reduced model not worse than full.")
if (reduced$df.residual<=full$df.residual) stop ("Reduced model doesn't have more degrees of freedom.")
lrtest.chi2 <- reduced$deviance-full$deviance
lrtest.df <- reduced$df.residual - full$df.residual
fitpval <- 1-pchisq(lrtest.chi2,lrtest.df)
cat("Likelihood ratio test on two models:\n\n")
data.frame(lrtest.chi2=lrtest.chi2,lrtest.df=lrtest.df,fitpval=fitpval)
}
## This gets the overall anova p-value out of a linear model object
lmp <- function (modelobject) {
if (class(modelobject) != "lm") stop("Not an object of class 'lm' ")
f <- summary(modelobject)$fstatistic
p <- pf(f[1],f[2],f[3],lower.tail=F)
attributes(p) <- NULL
return(p)
}
## Function for arranging ggplots. use png(); arrange(p1, p2, ncol=1); dev.off() to save.
vp.layout <- function(x, y) viewport(layout.pos.row=x, layout.pos.col=y)
arrange_ggplot2 <- function(..., nrow=NULL, ncol=NULL, as.table=FALSE) {
dots <- list(...)
n <- length(dots)
if(is.null(nrow) & is.null(ncol)) { nrow = floor(n/2) ; ncol = ceiling(n/nrow)}
if(is.null(nrow)) { nrow = ceiling(n/ncol)}
if(is.null(ncol)) { ncol = ceiling(n/nrow)}
## NOTE see n2mfrow in grDevices for possible alternative
grid.newpage()
pushViewport(viewport(layout=grid.layout(nrow,ncol) ) )
ii.p <- 1
for(ii.row in seq(1, nrow)){
ii.table.row <- ii.row
if(as.table) {ii.table.row <- nrow - ii.table.row + 1}
for(ii.col in seq(1, ncol)){
ii.table <- ii.p
if(ii.p > n) break
print(dots[[ii.table]], vp=vp.layout(ii.table.row, ii.col))
ii.p <- ii.p + 1
}
}
}
## Imputes the median value of a vector, matrix, or data frame.
## Stolen from na.roughfix function in the randomForest package.
na.roughfix <- function (object=NULL, ...) {
if (class(object) == "data.frame") {
isfac <- sapply(object, is.factor)
isnum <- sapply(object, is.numeric)
if (any(!(isfac | isnum)))
stop("dfMedianImpute only works for numeric or factor")
roughfix <- function(x) {
if (any(is.na(x))) {
if (is.factor(x)) {
freq <- table(x)
x[is.na(x)] <- names(freq)[which.max(freq)]
}
else {
x[is.na(x)] <- median(x, na.rm = TRUE)
}
}
x
}
object[] <- lapply(object, roughfix)
return(object)
}
else if(is.atomic(object)) {
d <- dim(object)
if (length(d) > 2)
stop("vectorMedianImpute can't handle objects with more than two dimensions")
if (all(!is.na(object)))
return(object)
if (!is.numeric(object))
stop("vectorMedianImpute can only deal with numeric data.")
if (length(d) == 2) {
hasNA <- which(apply(object, 2, function(x) any(is.na(x))))
for (j in hasNA) object[is.na(object[, j]), j] <- median(object[,
j], na.rm = TRUE)
}
else {
object[is.na(object)] <- median(object, na.rm = TRUE)
}
return(object)
}
else stop("Object is not a data frame or atomic vector")
}
## Makes a better scatterplot matrix.
## Stolen from the PerformanceAnalytics package: http://cran.r-project.org/web/packages/PerformanceAnalytics/index.html
## Also see http://moderntoolmaking.blogspot.com/2011/08/graphically-analyzing-variable.html
## To color code points based on levels of a factor, use these args:
## pairs.perfan(d, bg=c("red","blue")[d$factor], pch=21)
betterpairs <- function (R, histogram = TRUE, ...)
{
x=as.matrix(R) # in PerformanceAnalytics: x = checkData(R, method = "matrix")
if (mode(x)!="numeric") stop("Must pass in only numeric values")
panel.cor <- function(x, y, digits = 2, prefix = "", use = "pairwise.complete.obs", cex.cor, ...) {
usr <- par("usr")
on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- abs(cor(x, y, use = use))
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste(prefix, txt, sep = "")
if (missing(cex.cor)) cex <- 0.8/strwidth(txt)
test <- cor.test(x, y)
Signif <- symnum(test$p.value, corr = FALSE, na = FALSE, cutpoints = c(0, 0.001, 0.01, 0.05, 0.1, 1), symbols = c("***", "**", "*", ".", " "))
text(0.5, 0.5, txt, cex = cex * r)
text(0.8, 0.8, Signif, cex = cex, col = 2)
}
f <- function(t) dnorm(t, mean = mean(x), sd = sd(x))
# Useful function for histogram showing density overlay and rug
hist.panel = function(x, ...) {
par(new = TRUE)
hist(x, col = "light gray", probability = TRUE, axes = FALSE, main = "", breaks = "FD")
lines(density(x, na.rm = TRUE), col = "red", lwd = 1)
rug(x)
}
if (histogram) pairs(x, gap = 0, lower.panel = panel.smooth, upper.panel = panel.cor, diag.panel = hist.panel, ...)
else pairs(x, gap = 0, lower.panel = panel.smooth, upper.panel = panel.cor, ...)
}
# Did you make it this far?
message("\n******************************\nSuccessfully loaded Rprofile.r\n******************************")
view raw Rprofile.R hosted with ❤ by GitHub

To leave a comment for the author, please follow the link and comment on their blog: Getting Genetics Done.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)