Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I was recently hunting for a function that will strip the extension from a file – changing foo.png
to foo
, and so forth. I was knitting a report, and wanted to replace the file extension of the input with the extension of the the output file. (knitr
handles this automatically in most cases but I had some custom logic in there that meant I had to work things manually.)
Finding file extensions is such a common task that I figured that someone must have written a function to solve the problem already. A quick search using findFn("file extension")
from the sos
package revealed a few thousand hits. There’s a lot of noise in there, but I found a few promising candidates.
There’s removeExt
in the limma
package (you can find it on Bioconductor), strip_extension
in Kmisc
, remove_file_extension
which has identical copies in both spatial.tools
and gdalUtils
, and extension
in the raster
.
To save you the time and effort, I’ve tried them all, and unfortunately they all suck.
At a bare minimum, a file extension stripper needs to be vectorized, deal with different file extensions within that vector, deal with multiple levels of extension (for things like “tar.gz” files), and with filenames with dots in the name other than the extension, and with missing values, and with directories. OK, that’s quite a few things but I’m picky.
Since all the existing options failed, I’ve made my own function. In fact, I went overboard and created a package of path manipulation utilities, the pathological
package. It isn’t on CRAN yet, but you can install it via:
library(devtools) install_github("richierocks/pathological")
It’s been a while since I’ve used MATLAB, but I have fond recollections of its fileparts
function that splits a path up into the directory, filename and extension.
The pathological equivalent is to decompose a path, which returns a character matrix with three columns.
library(pathological) x <- c( "somedir/foo.tgz", # single extension "another dir\\bar.tar.gz", # double extension "baz", # no extension "quux. quuux.tbz2", # single ext, dots in filename R.home(), # a dir "~", # another dir "~/quuuux.tar.xz", # a file in a dir "", # empty ".", # current dir "..", # parent dir NA_character_ # missing ) (decomposed <- decompose_path(x)) ## dirname filename extension ## somedir/foo.tgz "d:/workspace/somedir" "foo" "tgz" ## another dir\\bar.tar.gz "d:/workspace/another dir" "bar" "tar.gz" ## baz "d:/workspace" "baz" "" ## quux. quuux.tbz2 "d:/workspace" "quux. quuux" "tbz2" ## C:/PROGRA~1/R/R-31~1.0 "C:/Program Files/R/R-3.1.0" "" "" ## ~ "C:/Users/richie/Documents" "" "" ## ~/quuuux.tar.xz "C:/Users/richie/Documents" "quuuux" "tar.xz" ## "" "" "" ## . "d:/workspace" "" "" ## .. "d:/" "" "" ## <NA> NA NA NA ## attr(,"class") ## [1] "decomposed_path" "matrix"
There are some shortcut functions to get at different parts of the filename:
get_extension(x) ## somedir/foo.tgz another dir\\bar.tar.gz baz ## "tgz" "tar.gz" "" ## quux. quuux.tbz2 C:/PROGRA~1/R/R-31~1.0 ~ ## "tbz2" "" "" ## ~/quuuux.tar.xz . ## "tar.xz" "" "" ## .. <NA> ## "" NA strip_extension(x) ## [1] "d:/workspace/somedir/foo" "d:/workspace/another dir/bar" ## [3] "d:/workspace/baz" "d:/workspace/quux. quuux" ## [5] "C:/Program Files/R/R-3.1.0" "C:/Users/richie/Documents" ## [7] "C:/Users/richie/Documents/quuuux" "/" ## [9] "d:/workspace" "d:/" ## [11] NA strip_extension(x, include_dir = FALSE) ## somedir/foo.tgz another dir\\bar.tar.gz baz ## "foo" "bar" "baz" ## quux. quuux.tbz2 C:/PROGRA~1/R/R-31~1.0 ~ ## "quux. quuux" "" "" ## ~/quuuux.tar.xz . ## "quuuux" "" "" ## .. <NA> ## "" NA
You can also get your original file location (in a standardised form) using
recompose_path(decomposed) ## [1] "d:/workspace/somedir/foo.tgz" ## [2] "d:/workspace/another dir/bar.tar.gz" ## [3] "d:/workspace/baz" ## [4] "d:/workspace/quux. quuux.tbz2" ## [5] "C:/Program Files/R/R-3.1.0" ## [6] "C:/Users/richie/Documents" ## [7] "C:/Users/richie/Documents/quuuux.tar.xz" ## [8] "/" ## [9] "d:/workspace" ## [10] "d:/" ## [11] NA
The package also contains a few other path utilities. The standardisation I mentioned comes from standardise_path
(standardize_path
also available for Americans), and there’s a dir_copy
function for copying directories.
It’s brand new, so after I’ve complained about other people’s code, I’m sure karma will ensure that some you’ll find a bug or two, but I hope you find it useful.
Tagged: directories, files, packages, pathological, paths, r
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.