Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A while back Steve Mcintyre was looking for a way to handle .Z files in R
Ron Broberg over at the whiteboard had an approach that steve adopted both for untar and for uncompressing .Z files. While the approach is slick, its somewhat of a hack. Nothing wrong with that, but I wanted something a bit more elegant.
Long ago a reader Nicholas created a package on R called “uncompress” to handle the .Z file issue, but steve was not able to get it to work and neither was I. Luckily Nicholas made his contact info available and I was able to get him a bug report with a file (ghcnv2.Z) and the code I used to download the file and unzip it. The error was relatively minor and related to end of file padding. Nicholas fixed the “bug” and today I had sucess with downloading and unzipping .Z files. So now in Moshtemp when you download the ghcnv2.Z file I will automagically unzip it for you.
next I decided to look at the untar problem. Steve Mc had “untared” files by copying a version of untar down to his system and then fed that exe a command from inside R. That’s un necessary as R has an “untar” command. So, below, we can see how to download “tar” files from NOAA, untar them, and then uncompress them. Any questions on “uncompress” just write. Its on CRAN
ftp <- “ftp://ftp.ncdc.noaa.gov/pub/data/icoads”
Imma <- “IMMA”
start <- 1914
end <- 1917 # test with small subset
years <- start:end
Tar_Dir <- “IcoadsTar”
Zfile_Dir <- “IcoadsZ”
Icoads_Dir <- “IcoadsData”
dir.create(Tar_Dir)
dir.create(Zfile_Dir)
dir.create(Icoads_Dir)
fnames <- paste(Imma,”.”,years,”.”,”tar”,sep=””)
# fnames is ALSO fetchable with RCurl.. when I learn it
getIcoadsTar <- function(site=ftp,files=fnames,tDir=Tar_Dir,zDir=Zfile_Dir){
for(i in 1:length(files)){
fullname <- file.path(site,files[i],fsep=.Platform$file.sep)
destinationfile=file.path(tDir,files[i],fsep=.Platform$file.sep)
download.file(fullname,destfile=destinationfile)
untar(destinationfile,exdir=file.path(getwd(),zDir,fsep=.Platform$file.sep))}
}
unZipIcoads <- function(zDir=Zfile_Dir,dataDir=Icoads_Dir){
files <- list.files(path=file.path(getwd(),zDir,fsep=.Platform$file.sep),full.names=TRUE,pattern=”(.Z)”)
localnames<-list.files(path=file.path(getwd(),zDir,fsep=.Platform$file.sep),full.names=FALSE,pattern=”(.Z)”)
destnames <- gsub(“.Z”,”.dat”,localnames)
for(i in 1:length(files)){
handle <- file(files[i], “rb”)
data <- readBin(handle, “raw”, 99999999)
close(handle)
uncomp_data <- uncompress(data)
handle <- file(file.path(dataDir,destnames[i],fsep=.Platform$file.sep), “wb”)
writeBin(uncomp_data, handle)
close(handle)
}
}
The first function will download and untar the files. When that completes, you unzip them all.
Have a nice weekend
UPDATE: a cleaner version that cleans up .Z files as you progress:
ftp <- “ftp://ftp.ncdc.noaa.gov/pub/data/icoads”
Imma <- “IMMA”
start <- 1914
end <- 1915 # test with small subset
years <- start:end
Tar_Dir <- “IcoadsTar”
Zfile_Dir <- “IcoadsZ”
Icoads_Dir <- “IcoadsData”
dir.create(Tar_Dir)
dir.create(Zfile_Dir)
dir.create(Icoads_Dir)
fnames <- paste(Imma,”.”,years,”.”,”tar”,sep=””)
download.unTarUnzipIcoads <- function(site=ftp,tars=fnames,tDir=Tar_Dir,zDir=Zfile_Dir,dDir=Icoads_Dir){
for(i in 1:length(tars)){
fullname <- file.path(site,tars[i],fsep=.Platform$file.sep)
destinationfile=file.path(tDir,tars[i],fsep=.Platform$file.sep)
download.file(fullname,destfile=destinationfile)
untar(destinationfile,exdir=file.path(getwd(),zDir,fsep=.Platform$file.sep))
files <- list.files(path=file.path(getwd(),zDir,fsep=.Platform$file.sep),full.names=TRUE,pattern=”(.Z)”)
localnames<-list.files(path=file.path(getwd(),zDir,fsep=.Platform$file.sep),full.names=FALSE,pattern=”(.Z)”)
destnames <- gsub(“.Z”,”.dat”,localnames)
for(j in 1:length(destnames)){
handle <- file(files[j], “rb”)
data <- readBin(handle, “raw”, 99999999)
close(handle)
uncomp_data <- uncompress(data)
handle <- file(file.path(dDir,destnames[j],fsep=.Platform$file.sep), “wb”)
writeBin(uncomp_data, handle)
close(handle)
}
unlink(files)
}
}
And if you just want a stand alone version to unzip .Z files
unZipdotZ<-function(Zfile,destfile,remove=TRUE){
# this function is called for the side effect of uncompressing a .Z file
# Zfile is a path to the Zfile
# destfile is the uncompressed file to be written
# no protection against overwriting
# remove the Z file
if(!file.exists(Zfile))stop( cat(Zfile,” does not exist”))
handle <- file(Zfile, “rb”)
data <- readBin(handle, “raw”, 99999999)
close(handle)
uncomp_data <- uncompress(data)
handle <- file(destfile, “wb”)
writeBin(uncomp_data, handle)
close(handle)
if(remove==TRUE)unlink(Zfile)
}
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.