Shrinking R’s PDF output
[This article was first published on   PlanetFlux, and kindly contributed to R-bloggers].  (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
            R is great for graphics, but I’ve found that the PDF’s R produces when drawing large plots can be extremely large. This is especially common when using spplot() to plot a large raster. I’ve made a 15 page PDF full of rasters that was hundreds of MB in size.  Obviously I don’t need all the detail (every pixel of the raster) represented in the pdf and would rather have it reduced in size somehow.  So I wrote an R function to automate the following:Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
- take an existing pdf and run ps2pdf on it as an intial compression step. Often this step is all that’s needed.
- split it into separate files using pdftk
- Check to see if each separate page is larger than some threshold you specify (I set 5MB as the default)
- If any one page is larger, rasterize the whole thing to a PNG file using ghostscript. I used the multicore package to parallelize this step, but this isn’t necessary and that call could be replaced by lapply() to run them sequentially.
- Put the separate pages (perhaps a mix of the original and the compressed rasters) back together.
Here’s the function:
 shrinkpdf<-function(pdf,maxsize=5,suffix="_small",verbose=T){  
  require(multicore)  
   wd=getwd()  
   td=paste(tempdir(),"/pdf",sep="")  
   if(!file.exists(td)) dir.create(td)  
   if(verbose) print("Performing initial compression")  
   system(paste("ps2pdf ",pdf," ",td,"/test.pdf",sep=""))  
   setwd(td)  
   system(paste("pdftk ",td,"/test.pdf burst",sep=""))  
   files=list.files(pattern="pg_")  
   sizes=sapply(files,function(x) file.info(x)$size)*0.000001 #get sizes of individual pages  
   toobig=sizes>=maxsize  
   if(verbose)  print(paste("Resizing ",sum(toobig)," pages:  (",paste(files[toobig],collapse=","),")",sep=""))  
   mclapply(files[toobig],function(i){  
    system(paste("gs -dBATCH -dTextAlphaBits=4 -dNOPAUSE -r300 -q -sDEVICE=png16m -sOutputFile=",i,".png ",i,sep=""))  
    system(paste("convert -quality 100 -density 300 ",i,".png ",strsplit(i,".",fixed=T)[[1]][1],".pdf ",sep=""))  
    if(verbose) print(paste("Finished page ",i))  
    return()  
   })  
   if(verbose) print("Compiling the final pdf")  
   file.remove("test.pdf")  
   file.remove(list.files(pattern="png"))  
   setwd(wd)  
   system(paste("gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=",strsplit(pdf,".",fixed=T)[[1]][1],suffix,".pdf ",td,"/*.pdf",sep=""))  
   file.remove(list.files(td,full=T))  
  if(verbose) print("Finished!!")  
 }  
		
            
To leave a comment for the author, please follow the link and comment on their blog:  PlanetFlux.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
