Digitizing jpeg graphs in R
[This article was first published on R by Emmanuel Jjunju, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I have been using third party programs for a long time until i came across the documentation for the R-package digitize. unfortunately, this package is not available for R 3.0.2 so i had to tweek things around. I am glad to share my solution.
I started by taking a look at http://lukemiller.org/index.php/2011/06/digitizing-data-from-old-plots-using-digitize/. Luke Miller has written a very nice description of how to use the digitize package. Some of the text here presented is from Luke Miller.
The digitize package by Timothée Poisot actually relies mainly only the functions readImg, ReadAnadCal, Digitdata and Calibrate. ReadImg requires readJPEG from the jpeg package. Once the jpeg package is installed and loaded, then just load these functions craeted by Timothée Poisot. The functions can be downloaded from https://github.com/tpoisot/digitize/blob/master/digitize/R/functions.r
The code snippet below shows my implementation. I have added the use of the tcltk2 package so that one can browse and select the the jpeg file directly.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Fix libraries | |
need<-c("jpeg","tcltk2","zoo") #needed libraries | |
ins<-installed.packages()[,1] #find out which libs are installed | |
(Get<-need[which(is.na(match(need,ins)))]) | |
if(length(Get)>0){install.packages(Get)} #install needed libs | |
eval(parse(text=paste("library(",need,")")))#load libraries | |
#load imagefile | |
jpegfile<-tk_choose.files(caption="JPEG FILE") | |
(outfile<-paste(unlist(strsplit(jpegfile,"\\."))[1],".txt",sep="")) | |
#digitize functions | |
ReadAndCal = function(fname) | |
{ | |
ReadImg(fname) | |
calpoints <- locator(n=4,type='p',pch=4,col='blue',lwd=2) | |
return(calpoints) | |
} | |
ReadImg = function(fname) | |
{ | |
img <- readJPEG(fname) | |
op <- par(mar=c(0,0,0,0)) | |
on.exit(par(op)) | |
plot.new() | |
rasterImage(img,0,0,1,1) | |
} | |
DigitData = function(col='red',type='p',...) | |
{ | |
type <- ifelse(type=='b','o',type) | |
type <- ifelse(type%in%c('l','o','p'),type,'p') | |
locator(type=type,col=col,...) | |
} | |
Calibrate = function(data,calpoints,x1,x2,y1,y2) | |
{ | |
x <- calpoints$x[c(1,2)] | |
y <- calpoints$y[c(3,4)] | |
cx <- lm(formula = c(x1,x2) ~ c(x))$coeff | |
cy <- lm(formula = c(y1,y2) ~ c(y))$coeff | |
data$x <- data$x*cx[2]+cx[1] | |
data$y <- data$y*cy[2]+cy[1] | |
return(as.data.frame(data)) | |
} | |
#digitize a graph | |
(cal = ReadAndCal(jpegfile))#This opens the jpeg in a plotting window and lets you define points on the x and y axes. You must start by clicking on the left-most x-axis point, then the right-most axis point, followed by the lower y-axis point and finally the upper y-axis point. You don’t need to choose the end points of the axis, only two points on the axis that you know the x or y value for. As you click on each of the 4 points, the coordinates are saved in the object cal. | |
(data.points = DigitData(col = 'red'))#You return to the figure window, and now you can click on each of the data points you’re interested in retrieving values for. The function will place a dot (colored red in this case) over each point you click on, and the raw x,y coordinates of that point will be saved to the data.points list. When you’re finished clicking points, you need to hit stop/Finish or right-click to stop the data point collection. | |
df = Calibrate(data.points, cal, 37257, 37287, 268, 276)#Finally, you need to convert those raw x,y coordinates into the same scale as the original graph. You do this by calling the Calibrate function and feeding it your data.point list, the cal list that contains your 4 control points from the first step, and then 4 numeric values that represent the 4 original points you clicked on the x and y axes. These values should be in the original scale of the figure (i.e. read the values off the graph’s tick marks). | |
#some manual editing of values at end/known points | |
df<-df[order(df$x),] | |
df[1,1]<-37257 | |
df[nrow(df),1]<-37287 | |
plot(df$x,df$y,type="l") | |
#Optiona | |
#interpolate at user defined points | |
fxn<-approxfun(df) | |
xnew<-seq(37257,37287,1/24) | |
ynew<-fxn(xnew) | |
dato<-as.POSIXct(xnew * (60*60*24), origin="1899-12-30", tz="GMT") #my known x-points are dates | |
newdf<-data.frame(dato,y=ynew) | |
plot(newdf,las=2,type="l") | |
write.table(newdf,file=outfile,sep=",",quote=FALSE) |
To leave a comment for the author, please follow the link and comment on their blog: R by Emmanuel Jjunju.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.