[This article was first published on everyday analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
In case you’ve been living under a rock, there’s been a lot of chatter in the financial world late about the price of oil going down. Way, way, down. So much so that the Bank of Canada cut interest rates. What crazy times are these we live in? I thought gas was only going to get more and more expensive until the end of time until everyone resorted to driving solar-powered cars.
I’m not economist (or commodities guy) and frankly, a lot of it seems like black magic and voodoo to me, but I thought it’d take a look at the data to see just how things have changed. Plus it’ll be an opportunity to muck about with some time series analysis in R.
Your average economists. |
Background
Not much background is really needed other than what I mentioned in the introduction, but, well, we do need some data to work with.
Ontario Gas Prices for a number of cities (as well as the province averages) can be found at the Ontario Ministry of Energy. Unfortunately they’re year-by-year CSVs with a lot of different information row-wise (Weekly, Month, YTD).
No biggie, a simple bash script will take care of downloading all the data with wget and parsing and concatenating the data with other *nix tools:
# Download data for i in $(seq 1990 2014) do wget http://www.energy.gov.on.ca/fuelupload/ONTREG$i.csv done # Retain the header head -n 2 ONTREG1990.csv | sed 1d > ONTREG_merged.csv # Loop over the files and use sed to extract the relevant lines for i in $(seq 1990 2014) do tail -n 15 ONTREG$i.csv | sed 13,15d | sed 's/./-01-'$i',/4' >> ONTREG_merged.csv done
Great! Now we have all monthly Ontario Gas Price data from 1990-2014 inclusive in one big file.
The WTI data I got from The Federal Reserve Bank of St. Louis, and the forecasts from the US Energy Information Administration.
The WTI data I got from The Federal Reserve Bank of St. Louis, and the forecasts from the US Energy Information Administration.
Analysis
First, a pretty plot of the data:
Now to the fun stuff. Let’s read the data into R and do some hacky time series analysis.
library(ts) library(forecast) # Read in the Ontario Gas Price Data data <- read.csv(file="ONTREG_merged.csv", header=T, sep=",") # Read in the WTI oil price data WTI_data <- read.csv(file='DCOILWTICO.csv',header=F, col.names=c("Date", "Value")) # Create a time series object for the WTI and Ontario Avg WTI <- ts(data=WTI_data$Value, frequency=12, start=c(1990,1), end=c(2014,12)) ON <- ts(data=data$ON.Avg, frequency=12, start=c(1990,1), end=c(2014,12)) # Plot and compare combined <- cbind(WTI, ON) plot(combined)We get a comparison plot of the data:
And we can look at the Ontario Gas Price as a function of the WTI. Linear is on the left, log-log on the right.
Next we build lm model objects and look at the diagnostics. I’ll spare the details, but I feel better about the log-log, so we’ll go with that.
# Create linear models (normal and log-log) l1 <- lm(ON ~ WTI, data=combined) l2 <- lm(log(ON) ~ log(WTI), data=combined) # Compare relative performance summary(l1) summary(l2) plot(l1) plot(l2) # Plot plot(ON ~ WTI, data=combined, pch=16, cex=0.3) abline(l1) plot(log(ON) ~ log(WTI), data=combined, pch=16, cex=0.3) abline(l2)
Lastly, we read in the forecast WTI data and use it to forecast the Ontario Gas price using our second model:
# Read in WTI forecast data WTI_forecast <- read.csv(file="WTI_forecast.csv", header=F, sep=",", col.names=c("Date", "Value")) # Forecast Ontario Gas Price fit <- forecast(l2, newdata=data.frame(WTI=WTI_forecast$Value)) # Unlog fit$mean <- exp(fit$mean) fit$lower <- exp(fit$lower) fit$upper <- exp(fit$upper) fit$x <- exp(fit$x) # Plot plot(fit, ylab='Ontario Average Gas Price (cents/L)')And there you have it! Putting the forecast data (blue line) and the WTI forecast back into our original graph, we can compare the two:
It’s that easy, right? That’s all there is to it?
Conclusion
Sadly, no. I’ve done some very quick work here and demonstrated some of the types of tools that are available in R, but “real” time-series analysis is a practice which requires much more care and nuance.
For example, linear modeling assumes that variables are stationary (i.e. have constant mean and variance) and not auto-correlated, properties which are almost never true in the real world. Using methods such as above for non-stationary times series can result in what is known as “spurious regression” – finding relationships between variables which don’t really exist, even though the results have high R-squared and p-values.
In these cases, testing the stationary assumption and then massaging of the data (differencing & deseasonalization) is usually required beforehand to handle the problem, or other models which do not have as strict assumptions as linear regression are more appropriate. For more on this, see the excellent book “Forecasting: Principles and Practice” by Rob J. Hyndman and George Athanasopoulos.
In these cases, testing the stationary assumption and then massaging of the data (differencing & deseasonalization) is usually required beforehand to handle the problem, or other models which do not have as strict assumptions as linear regression are more appropriate. For more on this, see the excellent book “Forecasting: Principles and Practice” by Rob J. Hyndman and George Athanasopoulos.
The take-away, as always, is that real life is complicated, and so analysis requires care. Predicting the future has never been, and never will be easy; either way, I just hope gas stays cheap.
References & Resources
code & data on github:
Fuel Price Data: Regular Unleaded Gasoline 2014 (Ontario Ministry of Energy):
http://www.energy.gov.on.ca/en/fuel-prices/fuel-price-data/?fuel=REG&yr=2015
Crude Oil Prices: West Texas Intermediate (WTI) (Federal Reserve Bank of St. Louis)
Crude Oil Prices: West Texas Intermediate (WTI) (Federal Reserve Bank of St. Louis)
Short-Term Energy Outlook (U.S. Energy Information Administration)
Forecasting: Principles and Practice (OTexts)
To leave a comment for the author, please follow the link and comment on their blog: everyday analytics.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.