[This article was first published on R snippets, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This week I was running computations transforming some input files into output files. The problem was that it was a repeated process. If new input files were generated or old ones were updated I needed to calculate new output files. The transformation was time consuming so I wanted to run the calculations only when required.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
My initial code was:
in.files <- list.files(pattern = glob2rx(“*.in”))
out.files <- gsub(“in$”, “out”, in.files)< o:p>
skip <- file.info(in.files)$mtime < file.info(out.files)$mtime
for (i in seq(along.with = in.files)) {< o:p>
if (skip[i]) {
# skip< o:p>
} else {< o:p>
# generate out.files[i] using in.files[i]
}< o:p>
}< o:p>
It should skip files for which output file modification time is greater than input file modification time. However, I quickly learned that it fails when the output file is missing as file.info returns NA for missing files. The fix I have found was to use isTRUE function which returns TRUE only if its argument is exactly TRUE and otherwise will return FALSE.
The only shortcoming of isTRUE function is the fact that if its argument is TRUE but has some attributes it returns FALSE. This can be seen in the following code:
> x <- TRUE< o:p>
> attr(x, “color”) <- “red”< o:p>
> names(x) <- “first”< o:p>
> x< o:p>
first < o:p>
TRUE < o:p>
attr(,“color”)< o:p>
[1] “red”< o:p>
> isTRUE(x)< o:p>
[1] FALSE< o:p>
> x[1]< o:p>
first < o:p>
TRUE < o:p>
> isTRUE(x[1])< o:p>
[1] FALSE< o:p>
> x[[1]]< o:p>
[1] TRUE< o:p>
> isTRUE(x[[1]])< o:p>
[1] TRUE< o:p>
The conclusion is that the safe form of the loop in the above example should be using [[ indexation:
for (i in seq(along.with = in.files)) {< o:p>
if (isTRUE(skip[[i]])) {
# skip< o:p>
} else {< o:p>
# generate out.files[i] using in.files[i]
}< o:p>
}< o:p>
This brought me to the problem of vector subsetting using logical arguments. See the following example:
> x <- 1:5< o:p>
> y <- c(FALSE, TRUE, NA, TRUE, FALSE)< o:p>
> names(y) <- 1:5< o:p>
> x[y]< o:p>
[1] 2 NA 4< o:p>
> x[sapply(y, isTRUE)]< o:p>
[1] 2 4< o:p>
When vector y contains NA it will be returned as NA in output vector. Sometimes such behavior is desirable, but there are cases when it is not. For example notice that we do not know if NA generated by x[y] is because vector x contained NA and y was TRUE or simply y was NA.
However, as we can see from the example using sapply(y, isTRUE) solves the problem. Fortunately y attributes are dropped before passing values to isTRUE so we get correct result without removing them beforehand.
To leave a comment for the author, please follow the link and comment on their blog: R snippets.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.