Simple and Advanced Time series with Oracle R Enterprise
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This guest post from
Marcos Arancibia describes how to use Oracle R Enterprise to analyze Time Series data.
In this article, we give an overview of how to use Time Series Analysis against data stored in Oracle Database, using the Embedded R Execution capability to send time series computations to the Oracle Database server instead processing at the client. We will also learn how to retrieve the final series or forecasts and retrieve them to the client for plotting, forecasting, and diagnosing.
One key thing to keep in mind when using Time Series techniques with data that is stored in Oracle Database is the order of the rows, or records. Because of the parallel capabilities of Oracle Database, when queried for records, one might end up receiving records out of order if an option for order is not specified.
Simple Example using Stock Data
Let’s start with a simple Time Series example. First we will need to connect to our Oracle Database using ORE. Then, using the package TTR, we will access Oracle Stock data from YahooData service, from January 1, 2008 to January 1, 2013 and push it to the database.
# Load the ORE library and connect to Oracle Database
library(ORE)
ore.connect(“myuser”,”mysid”,”myserver”,”mypass”,port=1521,all=TRUE)
library(TTR)
# Get data in XTS format
xts.orcl <- getYahooData("ORCL", 20080101, 20130101)
# Convert it to a data frame and gets the date
# Makes the date the Index
df.orcl <- data.frame(xts.orcl)
df.orcl$date <- (data.frame(date=index(xts.orcl))$date)
# Create/overwrite data in Oracle Database
# to a Table called ORCLSTOCK
ore.drop(table=”ORCLSTOCK”)
ore.create(df.orcl,table=”ORCLSTOCK”)
# IMPORTANT STEP!!!
# Ensure indexing is kept by date
rownames(ORCLSTOCK) <- ORCLSTOCK$date
# Ensure the data is in the DB
ore.ls()
# Review column names, data statistics and
# print a sample of the data
names(ORCLSTOCK)
>names(ORCLSTOCK)
[1] “Open” “High” “Low” “Close” “Volume”
[6] “Unadj.Close” “Div” “Split” “Adj.Div” “date”
summary(ORCLSTOCK$Close)
>summary(ORCLSTOCK$Close)
Min. 1st Qu. Median Mean 3rd Qu. Max.
13.36 20.53 24.22 24.79 29.70 35.73
head(ORCLSTOCK)
>head(ORCLSTOCK)
Open High Low Close Volume
2008-01-02 01:00:00 21.74414 22.00449 21.58022 21.68629 44360179
2008-01-03 01:00:00 21.62843 22.28413 21.62843 22.28413 43600532
2008-01-04 01:00:00 21.95628 22.06235 21.01130 21.24272 46391263
2008-01-07 01:00:00 21.17523 21.67664 21.01130 21.45486 41527032
2008-01-08 01:00:00 21.44522 21.52236 20.38453 20.39417 45155398
2008-01-09 01:00:00 20.57738 20.91487 20.39417 20.83773 49750304
Unadj.Close Div Split Adj.Div date
2008-01-02 01:00:00 22.49 NA NA NA 2008-01-02
2008-01-03 01:00:00 23.11 NA NA NA 2008-01-03
2008-01-04 01:00:00 22.03 NA NA NA 2008-01-04
2008-01-07 01:00:00 22.25 NA NA NA 2008-01-07
2008-01-08 01:00:00 21.15 NA NA NA 2008-01-08
2008-01-09 01:00:00 21.61 NA NA NA 2008-01-09
Pull data from the database for a simple plot
# Pull data from Oracle Database (only the necessary columns)
orcl <- ore.pull(ORCLSTOCK[,c("date","Close","Open","Low","High")])
# Simple plot with base libraries – Closing
plot(orcl$date,orcl$Close,type=”l”,col=”red”,xlab=”Date”,ylab=”US$”,
main=”Base plot:Daily ORACLE Stock Closing points”)
# Simple plot with base libraries – Other Series
plot(orcl$date,orcl$Open,type=”l”,col=”blue”,xlab=”Date”,ylab=”US$”,
main=”Base plot:Daily ORACLE Stock: Open/High/Low points”)
lines(orcl$date,orcl$High,col=”green”)
lines(orcl$date,orcl$Low,col=”orange”)
legend(“topleft”, c(“Opening”,”High”,”Low”),
col=c(“blue”,”green”,”orange”),lwd=2,title = “Series”,bty=”n”)
A different plot option, using the package xts
library(xts)
# Pull data from Oracle Database (only the necessary columns)
orcl <- ore.pull(ORCLSTOCK[,c("date","Close","Open","Low","High")])
# Convert data to Time Series format
orcl.xts <- as.xts(orcl,order.by=orcl$date,dateFormat="POSIXct")
# Plot original series
plot(orcl.xts$Close,major.ticks=’months’,minor.ticks=FALSE,
main=”Time Series plot:Daily ORACLE Stock Closing points”,col=”red”)
Simple Time Series: Moving Average Smoothing
We might be tempted to call functions like the Smoothing Moving Average from open-source CRAN packages against Oracle Database Tables, but those packages do not know what to do with an “ore.frame”. For that process to work correctly, we can either load the data locally or send the process for remote execution on the Database Server by using Embedded R Execution.
We will also explore the built-in Moving Average process from ore.rollmean() as a third alternative.
ALTERNATIVE 1 – The first example is pulling the data from Oracle Database into a ts (time series) object first, for a Client-side smoothing Process.
library(TTR)
# Pull part of the database table into a local data.frame
sm.orcl <- ore.pull(ORCLSTOCK[,c("date","Close")])
# Convert “Close” attribute into a Time Series (ts)
ts.orcl <- ts(sm.orcl$Close)
# Use SMA – Smoothing Moving Average algorithm from package TTR
ts.sm.orcl <-ts(SMA(ts.orcl,n=30),frequency=365, start=c(2008,1) )
# Plot both Series together
plot(sm.orcl$date,sm.orcl$Close,type=”l”,col=”red”,xlab=”Date”,ylab=”US$”,
main=”ORCL Stock Close CLIENT-side Smoothed Series n=30 days”)
lines(sm.orcl$date,ts.sm.orcl,col=”blue”)
legend(“topleft”, c(“Closing”,”MA(30) of Closing”),
col=c(“red”,”blue”),lwd=2,title = “Series”,bty=”n”)
ALTERNATIVE 2 – In this alternative, we will use a Server-side example for running the Smoothing via Moving Average, without bringing all data to the client. Only the result is brought locally for plotting. Remember that the TTR package has to be installed on the Server in order to be called.
# Server execution call using ore.tableApply
# Result is an ore.list that remains in the database until needed
sv.orcl.ma30 <-
ore.tableApply(ORCLSTOCK[,c(“date”,”Close”)],ore.connect = TRUE,
function(dat) {
library(TTR)
ordered <- dat[order(as.Date(dat$date, format="%Y-%m-%d")),]
list(res1 <- ts(ordered$Close,frequency=365, start=c(2008,1)),
res2 <- ts(SMA(res1,n=30),frequency=365, start=c(2008,1)),
res3 <- ordered$date)
}
);
# Bring the results locally for plotting
local.orcl.ma30 <- ore.pull(sv.orcl.ma30)
# Plot two series side by side
# (the third element of the list is the date)
plot(local.orcl.ma30[[3]],local.orcl.ma30[[1]],type=”l”,
col=”red”,xlab=”Date”,ylab=”US$”,
main=”ORCL Stock Close SERVER-side Smoothed Series n=30 days”)
# Add smoothed series
lines(local.orcl.ma30[[3]],
local.orcl.ma30[[2]],col=”blue”,type=”l”)
# Add legend
legend(“topleft”, c(“Closing”,”Server MA(30) of Closing”),
col=c(“red”,”blue”), lwd=2,title = “Series”, bty=”n”)
ALTERNATIVE 3 – In this alternative we will use a Server-side example with the computation of Moving Averages using the native ORE in-Database functions without bringing data to the client. Only the result is brought locally for plotting.
Just one line of code is needed to generate an in-Database Computation of Moving averages and the creation of a new VIRTUAL column in the Oracle Database. We will call this new column rollmean30.
We will use the function ore.rollmean(). The option align=”right” makes the MA look at only the past k days (30 in this case), or less, depending on the point in time. This creates a small difference between this method and the previous methods in the beginning of the series, since ore.rollmean() can actually calculate the first sets of days using smaller sets of data available, while other methods discard this data.
# Moving Average done directly in Oracle Database
ORCLSTOCK$rollmean30 <- ore.rollmean(ORCLSTOCK$Close, k = 30, align="right")
# Check that new variable is in the database
head(ORCLSTOCK)
>head(ORCLSTOCK)
Open High Low Close Volume
2008-01-02 01:00:00 21.74414 22.00449 21.58022 21.68629 44360179
2008-01-03 01:00:00 21.62843 22.28413 21.62843 22.28413 43600532
2008-01-04 01:00:00 21.95628 22.06235 21.01130 21.24272 46391263
2008-01-07 01:00:00 21.17523 21.67664 21.01130 21.45486 41527032
2008-01-08 01:00:00 21.44522 21.52236 20.38453 20.39417 45155398
2008-01-09 01:00:00 20.57738 20.91487 20.39417 20.83773 49750304
Unadj.Close Div Split Adj.Div date rollmean30
2008-01-02 01:00:00 22.49 NA NA NA 2008-01-02 21.68629
2008-01-03 01:00:00 23.11 NA NA NA 2008-01-03 21.98521
2008-01-04 01:00:00 22.03 NA NA NA 2008-01-04 21.73771
2008-01-07 01:00:00 22.25 NA NA NA 2008-01-07 21.66700
2008-01-08 01:00:00 21.15 NA NA NA 2008-01-08 21.41243
2008-01-09 01:00:00 21.61 NA NA NA 2008-01-09 21.31665
# Get results locally for plotting
local.orcl <- ore.pull(ORCLSTOCK[,c("date","Close", "rollmean30")])
sub.orcl <- subset(local.orcl,local.orcl$date> as.Date(“2011-12-16”))
# Plot the two series side by side
# First plot original series
plot(local.orcl$date, local.orcl$Close,type=”l”, col=”red”,xlab=”Date”,ylab=”US$”,
main=”ORCL Stock Close ORE Computation of Smoothed Series n=30 days”)
# Add smoothed series
lines(local.orcl$date,local.orcl$rollmean30,col=”blue”,type=”l”)
# Add legend
legend(“topleft”, c(“Closing”,”ORE MA(30) of Closing”),
col=c(“red”,”blue”),lwd=2,title = “Series”,bty=”n”)
Seasonal Decomposition for Time Series Diagnostics
Now that we have learned how to execute these processes using Embedded R, we can start using other methodologies required for Time Series using the same Server-side computation and local plotting.
It is typical for an analyst to try to understand a Time Series better by looking at some of the basic diagnostics like the Seasonal Decomposition of Time Series by Loess. These can be achieved by using the stl() command in the following process:
# Server execution
sv.orcl.dcom <-
ore.tableApply(ORCLSTOCK[,c(“date”,”Close”)],ore.connect = TRUE,
function(dat) {
ordered <- dat[order(as.Date(dat$date, format="%Y-%m-%d")),]
ts.orcl <- ts(ordered$Close,frequency=365, start=c(2008,1))
res <- stl(ts.orcl,s.window="periodic")
}
);
# Get result for plotting
local.orcl.dcom <- ore.pull(sv.orcl.dcom)
plot(local.orcl.dcom, main=”Server-side Decomposition of ORCL Time-Series”,col=”blue”)
Another typical set of diagnostic charts includes Autocorrelation and Partial Autocorrelation function plots. These can be achieved by using the acf() command with the proper options in Embedded R Execution, so computations happen at the Oracle Database server:
# Server-side ACF and PACF computation
# Use function acf() and save result as a list
sv.orcl.acf <-
ore.tableApply(ORCLSTOCK[,c(“date”,”Close”)],ore.connect=TRUE,
function(dat){
ts.orcl <- ts(dat$Close,frequency=365, start=c(2008,1))
list(res1 <- acf(ts.orcl,lag.max=120,type="correlation"),res2 <- acf(ts.orcl,lag.max=30, type="partial"))
}
);
# Get results for plotting
# ACF and PACF as members of the list pulled
local.orcl.acf <- ore.pull(sv.orcl.acf)
plot(local.orcl.acf[[1]],main=”Server-side ACF Analysis for Series ORCL”,col=”blue”,lwd=2)
plot(local.orcl.acf[[2]],main=”Server-side PACF Analysis for Series ORCL”,col=”blue”,lwd=5)
Simple Exponential Smoothing
Using the popular package “forecast”, we will use the ses() function to calculate a 90 days horizon (h=90) into the future, using the option criterion=MSE for the model. The package forecast needs to be installed on the Oracle Database server R engine.
Then, we will bring the resulting model locally for plotting. Remember to load the library “forecast” locally as well, to be able to interpret the meaning of the ses() output when it’s brought locally.
# Execute ses() call in the server
sv.orcl.ses <-
ore.tableApply(ORCLSTOCK[,c(“date”,”Close”)], ore.connect=TRUE,
function(dat) {
library(forecast)
ordered <- dat[order(as.Date(dat$date, format="%Y-%m-%d")),]
ts.orcl <- ts(ordered$Close,frequency=365, start=c(2008,1) )
res <- ses(ts.orcl, h=90, alpha=0.1, initial="simple")
}
);
# Get SES result locally for plotting
# Since remote object contains a SES model from package forecast,
# load package locally as well
library(forecast)
plot.orcl.ses <- ore.pull(sv.orcl.ses)
plot(plot.orcl.ses,col=”blue”,fcol=”red”,
main=”ORCL with Server-side SES – Simple Exponential Smoothing Forecast”)
Holt Exponential Smoothing
Using the popular package “forecast”, we will use the holt() function to calculate a 90 days horizon (h=90) into the future, requesting the Intervals of confidence of 80 and 95%. Again. the package “forecast” needs to be installed on the Oracle Database server R engine.
Then, we will bring the resulting model locally for plotting. Remember to load the library forecast locally as well, to be able to interpret the meaning of the holt() output when it’s brought locally.
# Execute holt() call in the server
sv.orcl.ets <-
ore.tableApply(ORCLSTOCK[,c(“date”,”Close”)], ore.connect=TRUE,
function(dat) {
library(forecast)
ordered <- dat[order(as.Date(dat$date, format="%Y-%m-%d")),]
ts.orcl <- ts(ordered$Close,frequency=365, start=c(2008,1))
res <- holt(ts.orcl, h=90, level=c(80,95), initial="optimal")
}
);
# Get resulting model from the server
# Since remote object contains a Holt Exponential Smoothing
# model from package forecast, load package locally as well
library(forecast)
local.orcl.ets <- ore.pull(sv.orcl.ets)
plot(local.orcl.ets,col=”blue”,fcol=”red”,
main=”ORCL Original Series Stock Close with Server-side Holt Forecast”)
ARIMA – Auto-Regressive Interactive Moving Average
There are at least two options for fitting an ARIMA model into a Time Series. One option is to use the package “forecast”, that allows for an automatic arima fitting (auto.arima) to find the best parameters possible based on the series.
For more advanced users, the arima() function in the “stats” package itself allows for choosing the model parameters.
# ARIMA models on the server using auto.arima() from package forecast
arimaModel <-
ore.tableApply(ORCLSTOCK[,c(“date”,”Close”)], ore.connect=TRUE,
FUN = function(dat){
# load forecast library to use auto.arima
library(forecast)
# sort the table into a temp file by date
ordered <- dat[order(as.Date(dat$date, format="%Y-%m-%d")),]
# convert column into a Time Series
# format ts(…) and request creation of an automatic
# ARIMA model auto.arima(…)
res <- auto.arima(ts(ordered$Close,frequency=365, start=c(2008,1)),
stepwise=TRUE, seasonal=TRUE)
})
# Alternative using the arima() from package “stats”.
arimaModel <-
ore.tableApply(ORCLSTOCK[,c(“date”,”Close”)],ore.connect=TRUE,
FUN = function(dat){
# sort table into a temp file by date
ordered <- dat[order(as.Date(dat$date, format="%Y-%m-%d")),]
# convert column into a Time Series
# format ts(…) and request creation of a specific
# ARIMA model using arima(), for example an ARIMA(2,1,2)
res <- arima(ts(ordered$Close,frequency=365, start=c(2008,1)),
order = c(2,1,2))
})
# Load forecast package locally to use the model
# for plotting and producing forecasts
library(forecast)
# Show remote resulting Time Series model
>arimaModel
Series: ts(ordered$Close, frequency = 365, start = c(2008, 1))
ARIMA(2,1,0)
Coefficients:
ar1 ar2
-0.0935 -0.0192
s.e. 0.0282 0.0282
sigma^2 estimated as 0.2323: log likelihood=-866.77
AIC=1739.55 AICc=1739.57 BIC=1754.96
# Get remote model using ore.pull for local prediction and plotting
local.arimaModel <- ore.pull(arimaModel)
# Generate forecasts for the next 15 days
fore.arimaModel <- forecast(local.arimaModel, h=15)
# Use the following option if you need to remove scientific notation of
# numbers that are too large in charts
options(scipen=10)
# Generate the plot of forecasts, including interval of confidence
# Main title is generated automatically indicating the type of model
# chosen by the Auto ARIMA process
plot(fore.arimaModel,type=”l”, col=”blue”, xlab=”Date”,
ylab=”Closing value (US$)”, cex.axis=0.75, font.lab=”serif EUC”,
sub=”Auto-generated ARIMA for ORCL Stock Closing”
)
# Generate and print forecasted data points plus standard errors
# of the next 15 days
forecasts <- predict(local.arimaModel, n.ahead = 15)
>forecasts
$pred
Time Series:
Start = c(2011, 165)
End = c(2011, 179)
Frequency = 365
[1] 33.29677 33.29317 33.29395 33.29395 33.29393 33.29393 33.29393 33.29393 33.29393 33.29393 33.29393
[12] 33.29393 33.29393 33.29393 33.29393
$se
Time Series:
Start = c(2011, 165)
End = c(2011, 179)
Frequency = 365
[1] 0.4819417 0.6504925 0.7807798 0.8928901 0.9924032 1.0827998 1.1662115 1.2440430 1.3172839 1.3866617
[11] 1.4527300 1.5159216 1.5765824 1.6349941 1.6913898
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.