Digitizing jpeg graphs in R

[This article was first published on R by Emmanuel Jjunju, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have been using third party programs for a long time until i came across the documentation for the R-package digitize. unfortunately, this package is not available for R 3.0.2 so i had to tweek things around. I am glad to share my solution.

I started by taking a look at http://lukemiller.org/index.php/2011/06/digitizing-data-from-old-plots-using-digitize/. Luke Miller has written a very nice description of how to use the digitize package. Some of the text here presented is from Luke Miller.

The digitize package by Timothée Poisot actually relies mainly  only the  functions readImg, ReadAnadCal, Digitdata and Calibrate. ReadImg requires readJPEG from the jpeg package. Once the jpeg package is installed and loaded,  then just  load these functions  craeted by Timothée Poisot. The functions can be downloaded from https://github.com/tpoisot/digitize/blob/master/digitize/R/functions.r

The code snippet below shows my implementation. I have added the use of the tcltk2 package so that one can browse and select the the jpeg file directly.
#Fix libraries
need<-c("jpeg","tcltk2","zoo") #needed libraries
ins<-installed.packages()[,1] #find out which libs are installed
(Get<-need[which(is.na(match(need,ins)))])
if(length(Get)>0){install.packages(Get)} #install needed libs
eval(parse(text=paste("library(",need,")")))#load libraries
#load imagefile
jpegfile<-tk_choose.files(caption="JPEG FILE")
(outfile<-paste(unlist(strsplit(jpegfile,"\\."))[1],".txt",sep=""))
#digitize functions
ReadAndCal = function(fname)
{
ReadImg(fname)
calpoints <- locator(n=4,type='p',pch=4,col='blue',lwd=2)
return(calpoints)
}
ReadImg = function(fname)
{
img <- readJPEG(fname)
op <- par(mar=c(0,0,0,0))
on.exit(par(op))
plot.new()
rasterImage(img,0,0,1,1)
}
DigitData = function(col='red',type='p',...)
{
type <- ifelse(type=='b','o',type)
type <- ifelse(type%in%c('l','o','p'),type,'p')
locator(type=type,col=col,...)
}
Calibrate = function(data,calpoints,x1,x2,y1,y2)
{
x <- calpoints$x[c(1,2)]
y <- calpoints$y[c(3,4)]
cx <- lm(formula = c(x1,x2) ~ c(x))$coeff
cy <- lm(formula = c(y1,y2) ~ c(y))$coeff
data$x <- data$x*cx[2]+cx[1]
data$y <- data$y*cy[2]+cy[1]
return(as.data.frame(data))
}
#digitize a graph
(cal = ReadAndCal(jpegfile))#This opens the jpeg in a plotting window and lets you define points on the x and y axes. You must start by clicking on the left-most x-axis point, then the right-most axis point, followed by the lower y-axis point and finally the upper y-axis point. You don’t need to choose the end points of the axis, only two points on the axis that you know the x or y value for. As you click on each of the 4 points, the coordinates are saved in the object cal.
(data.points = DigitData(col = 'red'))#You return to the figure window, and now you can click on each of the data points you’re interested in retrieving values for. The function will place a dot (colored red in this case) over each point you click on, and the raw x,y coordinates of that point will be saved to the data.points list. When you’re finished clicking points, you need to hit stop/Finish or right-click to stop the data point collection.
df = Calibrate(data.points, cal, 37257, 37287, 268, 276)#Finally, you need to convert those raw x,y coordinates into the same scale as the original graph. You do this by calling the Calibrate function and feeding it your data.point list, the cal list that contains your 4 control points from the first step, and then 4 numeric values that represent the 4 original points you clicked on the x and y axes. These values should be in the original scale of the figure (i.e. read the values off the graph’s tick marks).
#some manual editing of values at end/known points
df<-df[order(df$x),]
df[1,1]<-37257
df[nrow(df),1]<-37287
plot(df$x,df$y,type="l")
#Optiona
#interpolate at user defined points
fxn<-approxfun(df)
xnew<-seq(37257,37287,1/24)
ynew<-fxn(xnew)
dato<-as.POSIXct(xnew * (60*60*24), origin="1899-12-30", tz="GMT") #my known x-points are dates
newdf<-data.frame(dato,y=ynew)
plot(newdf,las=2,type="l")
write.table(newdf,file=outfile,sep=",",quote=FALSE)
view raw gistfile1.r hosted with ❤ by GitHub





To leave a comment for the author, please follow the link and comment on their blog: R by Emmanuel Jjunju.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)