Shrinking R’s PDF output
[This article was first published on PlanetFlux, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R is great for graphics, but I’ve found that the PDF’s R produces when drawing large plots can be extremely large. This is especially common when using spplot() to plot a large raster. I’ve made a 15 page PDF full of rasters that was hundreds of MB in size. Obviously I don’t need all the detail (every pixel of the raster) represented in the pdf and would rather have it reduced in size somehow. So I wrote an R function to automate the following:Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
- take an existing pdf and run ps2pdf on it as an intial compression step. Often this step is all that’s needed.
- split it into separate files using pdftk
- Check to see if each separate page is larger than some threshold you specify (I set 5MB as the default)
- If any one page is larger, rasterize the whole thing to a PNG file using ghostscript. I used the multicore package to parallelize this step, but this isn’t necessary and that call could be replaced by lapply() to run them sequentially.
- Put the separate pages (perhaps a mix of the original and the compressed rasters) back together.
Here’s the function:
shrinkpdf<-function(pdf,maxsize=5,suffix="_small",verbose=T){ require(multicore) wd=getwd() td=paste(tempdir(),"/pdf",sep="") if(!file.exists(td)) dir.create(td) if(verbose) print("Performing initial compression") system(paste("ps2pdf ",pdf," ",td,"/test.pdf",sep="")) setwd(td) system(paste("pdftk ",td,"/test.pdf burst",sep="")) files=list.files(pattern="pg_") sizes=sapply(files,function(x) file.info(x)$size)*0.000001 #get sizes of individual pages toobig=sizes>=maxsize if(verbose) print(paste("Resizing ",sum(toobig)," pages: (",paste(files[toobig],collapse=","),")",sep="")) mclapply(files[toobig],function(i){ system(paste("gs -dBATCH -dTextAlphaBits=4 -dNOPAUSE -r300 -q -sDEVICE=png16m -sOutputFile=",i,".png ",i,sep="")) system(paste("convert -quality 100 -density 300 ",i,".png ",strsplit(i,".",fixed=T)[[1]][1],".pdf ",sep="")) if(verbose) print(paste("Finished page ",i)) return() }) if(verbose) print("Compiling the final pdf") file.remove("test.pdf") file.remove(list.files(pattern="png")) setwd(wd) system(paste("gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=",strsplit(pdf,".",fixed=T)[[1]][1],suffix,".pdf ",td,"/*.pdf",sep="")) file.remove(list.files(td,full=T)) if(verbose) print("Finished!!") }
To leave a comment for the author, please follow the link and comment on their blog: PlanetFlux.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.