Extracting datasets from excel files in a zipped folder
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The title of the post is a bit long, but that’s the problem I was facing this morning: importing dataset from files, online. I mean, it was not a “problem” (since I can always download, and extract manually the files), more a challenge (I should be able to do it in R, directly). The files are located on ressources-actuarielles.net, in a zip file. Those are mortality tables used in French speaking African countries, and I guess that one problem came from special characters, such as “é” or “è”… When you open the zip file, you see a folder
and in that folder, several files that I would like to import
My first code was quite standard, unfortunately, it did not work
> library(xlsx) > loc.url <- "http://ressources-a...CIMA.zip" > temp <- tempfile() > download.file(loc.url,temp) trying URL 'http://ressources-a...CIMA.zip' Content type 'application/x-zip' length 503786 bytes (491 Kb) opened URL downloaded 491 Kb > list.files <- unzip(temp,list=TRUE) > data <- read.xlsx(unz(temp, + list.files$Name[1]),sheetIndex=1) Error in path.expand(file) : invalid 'path' argument
Indeed, there was a problem with the path,
> list.files$Name[1] [1] "Tables de mortalit‚ CIMA/CIMA F.xlsx"
Hopefully, as usual, @3wen came to rescue me, and suggested the following,
> Sys.setlocale("LC_ALL", "C") [1] "C" > loc.url <- "http://ressources-a...CIMA.zip" > td <- tempdir() > tf <- tempfile(tmpdir=td, fileext=".zip") > download.file(loc.url, tf) trying URL 'http://ressources-a...CIMA.zip' Content type 'application/x-zip' length 503786 bytes (491 Kb) opened URL ============================================= downloaded 491 Kb > fname <- unzip(tf, list=TRUE)$Name[1] > unzip(tf, files=fname, exdir=td, + overwrite=TRUE) > fpath <- file.path(td, fname)
That was it… then @3wen suggested another package to read the xlsx file (that works well on my linux laptop, not on Windows)
> data <- read.xls(fpath, sheet = 1) perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = "fr_CA:fr", LC_ALL = (unset), LANG = "fr_CA.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). > Sys.setlocale() [1] "" > head(data) Age Lx..CIMA.F. qx..CIMA.F. dx..CIMA.F. 1 0 1000000.0 0.2849% 2848.5034 2 1 997151.5 0.0387% 385.8644 3 2 996765.6 0.0296% 295.2717 4 3 996470.4 0.0252% 251.5163 5 4 996218.8 0.0216% 214.9381 6 5 996003.9 0.0197% 195.9929
That was more subtle than expected…
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.