Site icon R-bloggers

Lomb-Scargle periodogram for unevenly sampled time series

[This article was first published on me nugget, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the natural sciences, it is common to have incomplete or unevenly sampled time series for a given variable. Determining cycles in such series is not directly possible with methods such as Fast Fourier Transform (FFT) and may require some degree of interpolation to fill in gaps. An alternative is the Lomb-Scargle method (or least-squares spectral analysis, LSSA), which estimates a frequency spectrum based on a least squares fit of sinusoid.

The above figure shows a Lomb-Scargle periodogram of a time series of sunspot activity (1749-1997) with 50% of monthly values missing. As expected (link1, link2), the periodogram displays a a highly significant maximum peak at a frequency of ~11 years.

The function comes from a nice set of functions that I found here: http://research.stowers-institute.org/efg/2005/LombScargle/R/index.htm. An accompanying paper focusing on its application to time series of gene expression can be found here.

Below is a comparison to an FFT of the full time series. For another great resource on spectral analysis, and time series-related R methods in general, see the following website: http://zoonek2.free.fr/UNIX/48_R/15.html.


To reproduce the example:

#Required function # from: http://research.stowers-institute.org/efg/2005/LombScargle/R/index.htm
source("LombScargle.R") 
source("R-Mnemonics.R")
 
#Required data
data(sunspot.month) # Historical sunspots, class(sunspot.month) = "ts"
plot(sunspot.month)
 
#Create vector version of data
nspot <- as.vector(sunspot.month)
time <- seq(1749+0.5/12, 1997+11.5/12, by=1/12)
plot(nspot ~ time, t="l") # same as above
 
#make gaps in data to create "observed" data series
set.seed(1)
obs <- sample(seq(nspot), 0.5*length(nspot)) # 50% gaps
nspot.obs <- nspot[obs]
time.obs <- time[obs]
plot(nspot.obs ~ time.obs, pch=".", cex=2)
 
# Create test frequencies corresponding to
# periods from 1 to 50 years
M <- 4*length(time.obs) # often 2 or 4
TestFrequencies <- (1/50) + (1 - 1/50) * (1:M / M)
 
# Use Horne & Baliunas' estimate of independent frequencies
Nindependent <- NHorneBaliunas(length(nspot.obs))
 
#Compute
RES <- ComputeAndPlotLombScargle(time.obs, nspot.obs,
  TestFrequencies, Nindependent,
  "Sunspots")
 
#Output consists of a single figure with four plots showing:
   #1. time series.  The peak of the loess-smoothed curve (dotted line) is used for phase.
   #2. Histogram of time interval variability
   #3. Lomb-Scargle Periodogram with peak corresponding to a 11 year period #RES$PeakPeriod
   #4. Peak Significance Curve:  p-value of Lomb-Scargle peak is quite significant:  1 - (1-exp(-RES$SpectralPowerDensity[which(1/RES$Frequency == RES$PeakPeriod)]))^Nindependent
 
 
#Alternate plots
p01 <- -log(1-((1-0.01)^(1/Nindependent))) #P=0.01 
 
plot(RES$Frequency, RES$SpectralPowerDensity, t="n", xlab="Frequency [1/years]", ylab="Normalized Power Spectral Density")
abline(h=seq(par()$yaxp[1], par()$yaxp[2],, par()$yaxp[3]+1), col=8, lty=3)
abline(v=seq(par()$xaxp[1], par()$xaxp[2],, par()$xaxp[3]+1), col=8, lty=3)
lines(RES$Frequency, RES$SpectralPowerDensity, lwd=1)
mtext("Lomb-Scargle Periodogram", side=3, line=1)
abline(h=p01, col=2, lty=2)
text(0.9, p01, label="p = 0.01", pos=3, col=2)
 
plot(1/RES$Frequency, RES$SpectralPowerDensity, type="n", log = "x", xlab="Frequency [years]", ylab="Normalized Power Spectral Density") 
abline(h=seq(par()$yaxp[1], par()$yaxp[2],, par()$yaxp[3]+1), col=8, lty=3)
abline(v=c(1,2,5,10,20,50), col=8, lty=3)
lines(1/RES$Frequency, RES$SpectralPowerDensity, lwd=1)
mtext("Lomb-Scargle Periodogram", side=3, line=1)
abline(h=p01, col=2, lty=2)
text(2, p01, label="p = 0.01", pos=3, col=2)
 
png("GappySunspots_LombScargle.png", width=5, height=7, units="in", res=400)
#x11(width=5, height=7)
op <- par(mfcol=c(2,1), mar=c(5,5,2,1), ps=10)
 
plot(nspot.obs ~ time.obs, t="n", xlab="", ylab=expression(paste("No. sunspots [", "month"^-1, "]")))
grid()
points(nspot.obs ~ time.obs, pch=".", cex=2)
mtext("50% missing values", side=3, line=0.5)
 
plot(1/RES$Frequency, RES$SpectralPowerDensity, type="n", log = "x", xlab="Frequency [years]", ylab="Normalized Power Spectral Density") 
abline(h=seq(par()$yaxp[1], par()$yaxp[2],, par()$yaxp[3]+1), col=8, lty=3)
abline(v=c(1,2,5,10,20,50), col=8, lty=3)
lines(1/RES$Frequency, RES$SpectralPowerDensity, lwd=1)
mtext("Lomb-Scargle Periodogram", side=3, line=0.5)
abline(h=p01, col=2, lty=2)
text(2, p01, label="p = 0.01", pos=3, col=2)
 
par(op)
dev.off()
 
 
##############################
###Compare to other methods###
##############################
plot(sunspots)
 
spectrum(sunspot.month)
abline(v=1:10, lty=3)
 
z <- spectrum(sunspots, spans=2) #smoothed spectrum
 
png("LombScargle_vs_DFT.png", width=5, height=4, res=400, units="in")
XLIM <- c(2,20)
op <- par(mar=c(5,5,2,5), ps=10)
plot(1/RES$Frequency, RES$SpectralPowerDensity, type="n", log = "x", xlab="Frequency [years]", ylab="Normalized Power Spectral Density", xlim=XLIM) 
abline(h=seq(par()$yaxp[1], par()$yaxp[2],, par()$yaxp[3]+1), col=8, lty=3)
abline(v=c(1,2,5,10,20,50), col=8, lty=3)
lines(1/RES$Frequency, RES$SpectralPowerDensity, lwd=1)
par(new=TRUE)
plot(1/z$freq, z$spec, t="l", col=3, log = "x", xlim=XLIM, xlab="", ylab="", xaxt="n", yaxt="n")
axis(4)
mtext("Spectral Density", line=3, side=4, col=3)
legend("topleft", legend=c("Lomb-Scargle", "Discrete Fourier\nTransform"), bg="white", col=c(1,3), lty=1)
par(op)
dev.off()




To leave a comment for the author, please follow the link and comment on their blog: me nugget.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.