Autoregressive Moving Average Models and Power Spectral Densities
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
1 Introduction
The simplest way to estimate a power spectral density (PSD) is to use a periodogram, defined as
periodogram <- function(x) (1/length(x))*abs(fft(x))^2
where x
is the data samples. The periodogram estimates the power level at
A different approach is to estimate the PSD using a model. A good model can capture the important features of the PSD and be controlled by a few parameters. The autoregressive moving average (ARMA) model and the closely related autoregressive (AR) and moving average (MA) models can model a wide variety of PSDs and can be controlled by a few parameters.
This article defines the ARMA, AR, and MA models and then shows how to calculate their PSDs. We then cover how they are used to estimate PSDs, how to select between the three models, and how to choose the number of parameters (the model order). By the end of this article, you will understand how ARMA/AR/MA models are used for PSD estimation.
Much of this article comes from Chapter 8 of Power Spectral Density Estimation by Example Carbone (2022).
2 ARMA Model
An ARMA model of order
where
The input
Below, we use the ARMA[1,1] model to produce simulated data. In this case, we let
Note the rnorm(N)
to produce N
samples of WGN with variance 1.
set.seed(1) N <- 512 x <- rnorm(N)
Figure 1 is a plot of x
, the input into the ARMA[1,1] process defined in Equation 2.
Below is the ARMA[1,1] model from Equation 2 implemented in R.
y_arma11 <- rep(0,N) a <- 0.9 b <- -0.9 for (n in 2:N) y_arma11[n] <- -a[1]*y_arma11[n-1]+ b[1]*x[n-1] + x[n]
Figure 2 is a plot of y_arma11
, the model’s output.
The PSD
Because ARMA models have poles and zeros, they are good at modeling PSDs with peaks and valleys. When used for PSD estimation, ARMA estimators are called “high resolution” estimators because they are better than the periodogram at resolving closely spaced peaks.
If we let
The function psdArma
inputs the ARMA parameters and output the PSD calculated at evenly spaced point along the frequency axis. The function inputs: the AR coefficients arCoefficients
, the MA coefficients maCoefficients
, the variance of the input process inputVar
, and the number of points in the fft
used to calculate the PSD. The function uses the fft function to find zeroPad
(see below) to zero pad the AR and MA coefficients to produce Nfs
frequency samples. The parameters must be known or estimated before using psdArma
. If the parameters are known, the function outputs the PSD. If the parameters are estimated, the function outputs an estimated PSD.
zeroPad <- function(x,N) { c(x,rep(0,max(N-length(x),0))) }
psdArma <- function(arCoefficients,maCoefficients,inputVar,Nfs=256){ A <- fft(zeroPad(arCoefficients,Nfs)) B <- fft(zeroPad(maCoefficients,Nfs)) inputVar*abs(B)^2/abs(A)^2 }
Let’s use psdArma
to calculate the PSD of the ARMA[1,1] process in Equation 2.
psdArma11 <- 10*log10(psdArma(c(1,a),c(1,b),inputVar=1,Nfs=length(x)))
Figure 3 contains two plots. The blue plot is the PSD of the ARMA[1,1] process that produced the data in Figure 2. The black plot is the PSD of the WGN input.
There are three things to notice about the PSD plots. First, the frequency units on the x-axis are a fraction of the sampling frequency (cycles/sample). If you know the sampling frequency (samples/second), you could recover frequency in Hz (cycles/second) by multiplying frequency by the sampling frequency, as
Second, the y-axis is power in decibels (dB), calculated as
Third, PSDs are periodic with the sampling frequency. So, they can be plotted along any region that shows one period. Below is an example of the PSD from above plotted from -0.5 to 0.5.
3 Autoregresive Model
ARMA[
We use the AR[1] model below to produce simulated data. We let
We use a for
loop to implement the ARMA model. We could do the same thing for the AR model, but using the filter
command from the gsignal
is faster and makes the code more readable. The function inputs the AR coefficients a
, the MA coefficients b
, and the samples to be filtered x
. The function outputs the data filtered by the ARMA coefficients.
Figure 5 is a plot of the samples produced by the code below.
set.seed(1) N <- 512 x <- rnorm(N) a <- c(1.0, 0.9) b <- 1 y_ar1 <- gsignal::filter(b,a,x)
The PSD of an AR process is
Below we find the PSD of the AR[1] model above.
psdAr1 <- 10*log10(psdArma(a,b,inputVar=1,Nfs=length(x)))
Figure 5 contains two plots. The blue plot is the PSD of the AR[1] process that produced the data in Figure 5. The black plot is the PSD of the WGN input.
psdAr1 <- 10*log10(psdArma(a,b,inputVar=1,Nfs=length(x)))
4 Moving Average Model
ARMA[0,
set.seed(1) N <- 512 x <- rnorm(N) a <- 1 b <- c(1,-0.9) y_ma1 <- gsignal::filter(b,a,x)
Figure 7 is a plot of the samples produced by the code above.
The PSD of an MA process is
Below we find the PSD of the MA[1] model above.
psdMa1 <- 10*log10(psdArma(a,b,inputVar=1,Nfs=length(x)))
Figure 8 contains two plots. The blue plot is the PSD of the MA[1] process that produced the data in Figure 7. The black plot is the PSD of the WGN input.
5 Estimating PSDs using Models
ARMA/AR/MA PSD estimation is performed using the following steps. First, choose the model you wish to use. Second, use the data to estimate the model parameters. Parameter estimation is beyond the scope of this article. You can find information in various places, including Carbone (2022), Kay (1988), or Marple (1987). Finally, calculate the estimated PSD using the estimated parameters in the psdArma
function.
6 Choosing a Model
If you are only interested in peaks, use the AR model. If only the valleys are important, use the MA model. If both the peaks and valleys are important, use the ARMA model. As mentioned above, peaks are usually the more important feature, so the AR model is often chosen.
7 Model Order Selection
An ARMA model can model at most
Model order estimators are mathematical tools that let you estimate the order of the process that produced the data. The Akaike Information Criterion (AIC) is a model order estimator built into the arima
function. The AIC calculates how well the model fits the data (lower is better).
The arima
function produces several outputs, but we will use the AIC. So we use the arima
function with $aic
, which will only output the AIC value. The arima$aic
function inputs the data and the model order and then outputs the AIC associated with that model order. Below is an example of using the function to find the model order.
Below we use the AR[1] data shown in Figure 5 and calculate the AIC for AR model orders 1 to 10.
aicAR <- rep(0,10) for (modelOrder in 1:10) { aicAR[modelOrder] <- arima(y_ar1, order = c(AR=modelOrder, 0, MA=0))$aic }
Figure 9 is a plot of the AIC for AR models 1 to 10. Note the minimum AIC is at AR[1], which is the correct model order.
However, we choose the model order; we don’t want to overfit the model. We want to choose the minimum model order that can model the important features. Overfitting the model requires more parameters and can add bias and variance to our estimate.
References
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.