Sync Your Rprofile Across Multiple R Installations
[This article was first published on Getting Genetics Done, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Your Rprofile is a script that R executes every time you launch an R session. You can use it to automatically load packages, set your working directory, set options, define useful functions, and set up database connections, and run any other code you want every time you start R.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
If you’re using R in Linux, it’s a hidden file in your home directory called ~/.Rprofile, and if you’re on Windows, it’s usually in the program files directory: C:\Program Files\R\R-2.12.2\library\base\R\Rprofile. I sync my Rprofile across several machines and operating systems by creating a separate script called called syncprofile.R and storing this in my Dropbox. Then, on each machine, I edit the real Rprofile to source the syncprofile.R script that resides in my Dropbox.
One of the disadvantages of doing this, however, is that all the functions you define and variables you create are sourced into the global environment (.GlobalEnv). This can clutter your workspace, and if you want to start clean using rm(list=ls(all=TRUE)), you’ll have to re-source your syncprofile.R script every time.
It’s easy to get around this problem. Rather than simply appending source(/path/to/dropbox/syncprofile.R) to the end of your actual Rprofile, first create a new environment, source that script into that new environment, and attach that new environment. So you’ll add this to the end of your real Rprofile on each machine/installation:
my.env <- new.env()
sys.source(“C:/Users/st/Dropbox/R/Rprofile.r”, my.env)
attach(my.env)
All the functions and variables you’ve defined are now available but they no longer clutter up the global environment.
If you have code that you only want to run on specific machines, you can still put that into each installation’s Rprofile rather than the syncprofile.R script that you sync using Dropbox. Here’s what my syncprofile.R script looks like – feel free to take whatever looks useful to you.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# To source this file into an environment to avoid cluttering the global workspace, put this in Rprofile: | |
# my.env <- new.env(); sys.source("C:/PathTo/THIS_FILE.r", my.env); attach(my.env) | |
#----------------------------------------------------------------------- | |
# Load packages, set options and cwd, set up database connection | |
#----------------------------------------------------------------------- | |
## Load packages | |
# require(ggplot2) #plotting | |
# require(plyr) #data manipulation | |
# require(reshape) #data manipulation | |
# require(sqldf) #manipulate data frame with SQL | |
## Sets the working directory to C:/R | |
setwd("~/R") | |
## Don't show those silly significanct stars | |
options(show.signif.stars=FALSE) | |
## Do you want to automatically convert strings to factor variables in a data.frame? | |
options(stringsAsFactors=TRUE) | |
## Hard code the US repository for CRAN so it doesn't ask me every time. | |
r <- getOption("repos") | |
r["CRAN"] <- "http://cran.us.r-project.org" | |
options(repos = r) | |
rm(r) | |
## Some SQLite stuff I don't use any more because I switched to MySQL | |
# require(RSQLite) | |
# channel <- dbConnect(SQLite(), "C:/cygwin/home/sturner/dbs/sdt.sqlite") | |
# query <- function(...) dbGetQuery(channel,...) | |
## Set up ODBC connection for MySQL localhost, and make it easy to query a database with query() function. | |
require(RODBC) # The rprofile script will fail here if you don't have RODBC installed. | |
channel <- odbcConnect("localhost") | |
query <- function(...) sqlQuery(channel, ...) | |
#----------------------------------------------------------------------- | |
# Functions | |
#----------------------------------------------------------------------- | |
## Transpose a numeric data frame with ID in first column | |
tdf <- function(d) { | |
row.names(d) <- d[[1]] | |
d[[1]] <- NULL | |
d <- as.data.frame(t(d)) | |
d$id <- row.names(d) | |
d <- cbind(d[ncol(d)], d[-ncol(d)]) | |
row.names(d) <- NULL | |
d | |
} | |
## Convert selected columns of a data frame to factor variables | |
factorcols <- function(d, ...) lapply(d, function(x) factor(x, ...)) | |
## Returns a logical vector TRUE for elements of X not in Y | |
"%nin%" <- function(x, y) !(x %in% y) | |
## Returns names(df) in single column, numbered matrix format. | |
n <- function(df) matrix(names(df)) | |
## Single character shortcuts for summary() and head(). | |
s <- base::summary | |
h <- utils::head | |
## ht==headtail, i.e., show the first and last 10 items of an object | |
ht <- function(d) rbind(head(d,10),tail(d,10)) | |
## Show the first 5 rows and first 5 columns of a data frame or matrix | |
hh <- function(d) d[1:5,1:5] | |
## Read data on clipboard. | |
read.cb <- function(...) { | |
ismac <- Sys.info()[1]=="Darwin" | |
if (!ismac) read.table(file="clipboard", ...) | |
else read.table(pipe("pbpaste"), ...) | |
} | |
## Open current directory on mac | |
macopen <- function(...) system("open .") | |
## name("test.png") results in "C:/R/2010-04-20-test.png" if running this in C:/R on April 20 2010. | |
name <- function(filename="filename") paste(getwd(),"/",Sys.Date(),"-",filename,sep="") | |
## Takes a dataframe and a column name, and moves that column to the front of the DF. | |
moveColFront <- function(d=dataframe, colname="colname") { | |
index <- match(colname, names(d)) | |
cbind(d[index],d[-index]) | |
} | |
## Permutes a column in a data.frame, sets seed optionally | |
permute <- function (dataframe, columnToPermute="column", seed=NULL) { | |
if (!is.null(seed)) set.seed(seed) | |
colindex <- which(names(dataframe)==columnToPermute) | |
permutedcol <- dataframe[ ,colindex][sample(1:nrow(dataframe))] | |
dataframe[colindex] <- permutedcol | |
return(dataframe) | |
} | |
## Summarize missing data in a data frame. Return a list (lpropmiss) or data frame (propmiss) | |
lpropmiss <- function(dataframe) lapply(dataframe,function(x) data.frame(nmiss=sum(is.na(x)), n=length(x), propmiss=sum(is.na(x))/length(x))) | |
propmiss <- function(dataframe) { | |
m <- sapply(dataframe, function(x) { | |
data.frame( | |
nmiss=sum(is.na(x)), | |
n=length(x), | |
propmiss=sum(is.na(x))/length(x) | |
) | |
}) | |
d <- data.frame(t(m)) | |
d <- sapply(d, unlist) | |
d <- as.data.frame(d) | |
d$variable <- row.names(d) | |
row.names(d) <- NULL | |
d <- cbind(d[ncol(d)],d[-ncol(d)]) | |
return(d[order(d$propmiss), ]) | |
} | |
## Make a pretty QQ plot of p-values | |
qq = function(pvector, ...) { | |
if (!is.numeric(pvector)) stop("D'oh! P value vector is not numeric.") | |
pvector <- pvector[!is.na(pvector) & pvector<1 & pvector>0] | |
o = -log10(sort(pvector,decreasing=F)) | |
#e = -log10( 1:length(o)/length(o) ) | |
e = -log10( ppoints(length(pvector) )) | |
plot(e,o,pch=19,cex=1, xlab=expression(Expected~~-log[10](italic(p))), ylab=expression(Observed~~-log[10](italic(p))), xlim=c(0,max(e)), ylim=c(0,max(o)), ...) | |
abline(0,1,col="red") | |
} | |
## Draw a histogram with normal overlay (From http://www.statmethods.net/graphs/density.html) | |
histnormal <- function(d, main=NULL, xlab=NULL, breaks="FD", ...) { | |
if (any(is.na(d))) warning(paste(sum(is.na(d)), "missing values")); d <- na.omit(d) | |
h <- hist(d, plot=FALSE, breaks=breaks, ...) | |
x <- seq(min(d), max(d), length=40) | |
y <- dnorm(x, mean=mean(d), sd=sd(d)) | |
y <- y*diff(h$mids[1:2])*length(d) | |
hist(d, col="gray50", main=main, xlab=xlab, ylim=c(0,max(y)), breaks=breaks,...) | |
lines(x,y, col="blue", lwd=2) | |
rug(x) | |
} | |
## Draw a histogram with density overlay | |
histdensity <- function(x, main=NULL, breaks="FD", ...) { | |
if (any(is.na(x))) warning(paste(sum(is.na(x)), "missing values")); x <- na.omit(x) | |
hist(x, col="gray50", probability=TRUE, breaks=breaks, main=main, ...) | |
lines(density(x, na.rm = TRUE), col = "blue", lwd=2) | |
rug(x) | |
} | |
## Plot scatterplot with trendline and confidence interval (From http://tinyurl.com/3bvrth7) | |
scatterci <- function(x, y, ...) { | |
plot(x, y, ...) | |
mylm <- lm(y~x) | |
abline(mylm, col="blue") | |
x=sort(x) | |
prd<-predict(mylm,newdata=data.frame(x=x),interval = c("confidence"), level = 0.95) | |
lines(x,prd[,2],col="blue",lty=3) | |
lines(x,prd[,3],col="blue",lty=3) | |
} | |
## Get the proportion variation explained. See this website for more details: http://goo.gl/jte8X | |
rsq <- function(predicted, actual) 1-sum((actual-predicted)^2)/sum((actual-mean(actual))^2) | |
## Correlation matrix with p-values. See http://goo.gl/nahmV for documentation of this function | |
cor.prob <- function(X, dfr = nrow(X) - 2) { | |
R <- cor(X) | |
above <- row(R) < col(R) | |
r2 <- R[above]^2 | |
Fstat <- r2 * dfr / (1 - r2) | |
R[above] <- 1 - pf(Fstat, 1, dfr) | |
R[row(R)==col(R)]<-NA | |
R | |
} | |
## This function accepts a GLM object and does a LR chi-square test on the fit. | |
lrt <- function (modelobject) { | |
lrtest.chi2 <- model$null.deviance - model$deviance # Difference in deviance between model with intercept only and full model. This is the likelihood ratio test statistic (-2(log(L))). | |
lrtest.df <- model$df.null - model$df.residual # Difference in DF. Make sure this equals the number of predictors in the model! | |
fitpval <- 1-pchisq(lrtest.chi2,lrtest.df) | |
cat("Likelihood ratio test on model fit:\n\n") | |
data.frame(lrtest.chi2=lrtest.chi2,lrtest.df=lrtest.df,fitpval=fitpval) #Output gives you the chisquare, df, and p-value. | |
} | |
## This function does the same thing as lrtest in the Design package, but doesn't do as much checking. | |
## Remember, the lrt has to test the same model (model fit on same observations) | |
## Also the drop1(fullmodel,test="Chisq") does something similar. | |
lrt2 <- function (full,reduced) { | |
if (reduced$deviance<=full$deviance) stop ("Reduced model not worse than full.") | |
if (reduced$df.residual<=full$df.residual) stop ("Reduced model doesn't have more degrees of freedom.") | |
lrtest.chi2 <- reduced$deviance-full$deviance | |
lrtest.df <- reduced$df.residual - full$df.residual | |
fitpval <- 1-pchisq(lrtest.chi2,lrtest.df) | |
cat("Likelihood ratio test on two models:\n\n") | |
data.frame(lrtest.chi2=lrtest.chi2,lrtest.df=lrtest.df,fitpval=fitpval) | |
} | |
## This gets the overall anova p-value out of a linear model object | |
lmp <- function (modelobject) { | |
if (class(modelobject) != "lm") stop("Not an object of class 'lm' ") | |
f <- summary(modelobject)$fstatistic | |
p <- pf(f[1],f[2],f[3],lower.tail=F) | |
attributes(p) <- NULL | |
return(p) | |
} | |
## Function for arranging ggplots. use png(); arrange(p1, p2, ncol=1); dev.off() to save. | |
vp.layout <- function(x, y) viewport(layout.pos.row=x, layout.pos.col=y) | |
arrange_ggplot2 <- function(..., nrow=NULL, ncol=NULL, as.table=FALSE) { | |
dots <- list(...) | |
n <- length(dots) | |
if(is.null(nrow) & is.null(ncol)) { nrow = floor(n/2) ; ncol = ceiling(n/nrow)} | |
if(is.null(nrow)) { nrow = ceiling(n/ncol)} | |
if(is.null(ncol)) { ncol = ceiling(n/nrow)} | |
## NOTE see n2mfrow in grDevices for possible alternative | |
grid.newpage() | |
pushViewport(viewport(layout=grid.layout(nrow,ncol) ) ) | |
ii.p <- 1 | |
for(ii.row in seq(1, nrow)){ | |
ii.table.row <- ii.row | |
if(as.table) {ii.table.row <- nrow - ii.table.row + 1} | |
for(ii.col in seq(1, ncol)){ | |
ii.table <- ii.p | |
if(ii.p > n) break | |
print(dots[[ii.table]], vp=vp.layout(ii.table.row, ii.col)) | |
ii.p <- ii.p + 1 | |
} | |
} | |
} | |
## Imputes the median value of a vector, matrix, or data frame. | |
## Stolen from na.roughfix function in the randomForest package. | |
na.roughfix <- function (object=NULL, ...) { | |
if (class(object) == "data.frame") { | |
isfac <- sapply(object, is.factor) | |
isnum <- sapply(object, is.numeric) | |
if (any(!(isfac | isnum))) | |
stop("dfMedianImpute only works for numeric or factor") | |
roughfix <- function(x) { | |
if (any(is.na(x))) { | |
if (is.factor(x)) { | |
freq <- table(x) | |
x[is.na(x)] <- names(freq)[which.max(freq)] | |
} | |
else { | |
x[is.na(x)] <- median(x, na.rm = TRUE) | |
} | |
} | |
x | |
} | |
object[] <- lapply(object, roughfix) | |
return(object) | |
} | |
else if(is.atomic(object)) { | |
d <- dim(object) | |
if (length(d) > 2) | |
stop("vectorMedianImpute can't handle objects with more than two dimensions") | |
if (all(!is.na(object))) | |
return(object) | |
if (!is.numeric(object)) | |
stop("vectorMedianImpute can only deal with numeric data.") | |
if (length(d) == 2) { | |
hasNA <- which(apply(object, 2, function(x) any(is.na(x)))) | |
for (j in hasNA) object[is.na(object[, j]), j] <- median(object[, | |
j], na.rm = TRUE) | |
} | |
else { | |
object[is.na(object)] <- median(object, na.rm = TRUE) | |
} | |
return(object) | |
} | |
else stop("Object is not a data frame or atomic vector") | |
} | |
## Makes a better scatterplot matrix. | |
## Stolen from the PerformanceAnalytics package: http://cran.r-project.org/web/packages/PerformanceAnalytics/index.html | |
## Also see http://moderntoolmaking.blogspot.com/2011/08/graphically-analyzing-variable.html | |
## To color code points based on levels of a factor, use these args: | |
## pairs.perfan(d, bg=c("red","blue")[d$factor], pch=21) | |
betterpairs <- function (R, histogram = TRUE, ...) | |
{ | |
x=as.matrix(R) # in PerformanceAnalytics: x = checkData(R, method = "matrix") | |
if (mode(x)!="numeric") stop("Must pass in only numeric values") | |
panel.cor <- function(x, y, digits = 2, prefix = "", use = "pairwise.complete.obs", cex.cor, ...) { | |
usr <- par("usr") | |
on.exit(par(usr)) | |
par(usr = c(0, 1, 0, 1)) | |
r <- abs(cor(x, y, use = use)) | |
txt <- format(c(r, 0.123456789), digits = digits)[1] | |
txt <- paste(prefix, txt, sep = "") | |
if (missing(cex.cor)) cex <- 0.8/strwidth(txt) | |
test <- cor.test(x, y) | |
Signif <- symnum(test$p.value, corr = FALSE, na = FALSE, cutpoints = c(0, 0.001, 0.01, 0.05, 0.1, 1), symbols = c("***", "**", "*", ".", " ")) | |
text(0.5, 0.5, txt, cex = cex * r) | |
text(0.8, 0.8, Signif, cex = cex, col = 2) | |
} | |
f <- function(t) dnorm(t, mean = mean(x), sd = sd(x)) | |
# Useful function for histogram showing density overlay and rug | |
hist.panel = function(x, ...) { | |
par(new = TRUE) | |
hist(x, col = "light gray", probability = TRUE, axes = FALSE, main = "", breaks = "FD") | |
lines(density(x, na.rm = TRUE), col = "red", lwd = 1) | |
rug(x) | |
} | |
if (histogram) pairs(x, gap = 0, lower.panel = panel.smooth, upper.panel = panel.cor, diag.panel = hist.panel, ...) | |
else pairs(x, gap = 0, lower.panel = panel.smooth, upper.panel = panel.cor, ...) | |
} | |
# Did you make it this far? | |
message("\n******************************\nSuccessfully loaded Rprofile.r\n******************************") |
To leave a comment for the author, please follow the link and comment on their blog: Getting Genetics Done.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.