Testing the Correlation between Time Series Variables
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In the previous article, we examined trends and seasonality in gasoline prices in Turkey. This time we will examine whether the gasoline prices are related to the variables that are thought to affect gasoline prices the most by the Turkish people. One of the variables is the Brent crude oil prices that are averaged monthly in dollars; the other is the dollar exchange rate in Turkish currency (TL) that are averaged per month as well. These variables will be shown brent and dollar respectively in the dataset below. The range of the dataset is between 2013 and 2020 as the previous article.
head(df) # date gasoline brent dollar #1 2013-01-01 4.67 115.55 1.7589 #2 2013-02-01 4.85 111.38 1.7985 #3 2013-03-01 4.75 110.02 1.8090 #4 2013-04-01 4.61 102.37 1.7930 #5 2013-05-01 4.64 100.39 1.8756 #6 2013-06-01 4.72 102.16 1.9288
The T-test is used to examine whether the population correlation coefficient is zero or not. The pre-acceptance is that the sample is normally distributed. This pre-acceptance is violated in some situations, in those cases, an alternative non-parametric test is needed. The Spearman’s rank correlation test takes over here; because profit or price data generally do not show normal distribution. Therefore, it is not appropriate to use the Pearson correlation coefficient test in our dataset.
Spearman’s rank correlation test consider ranking while it measures the correlation between two variables. The value is as between +1 and -1 as is the Pearson correlation coefficient . Two-way hypothesis test is described as:
First of all, the sample spearman rank correlation coeffficient is calculated to execute the test; this happens in a couple of steps.
- Gasoline prices are ranked from small to big; in the case of equality, the ranking of equal observations are averaged and the ranking continues from where it left off. The same process is executed for Brent prices.
library(dplyr) df_spearman<- df %>% mutate( rank_gasoline=rank(gasoline), rank_brent=rank(brent), d=rank_gasoline-rank_brent, d_square=d^2) %>% select(-dollar) head(df_spearman) # date gasoline brent rank_gasoline rank_brent d d_square #1 2013-01-01 4.67 115.55 23 84 -61 3721 #2 2013-02-01 4.85 111.38 34 81 -47 2209 #3 2013-03-01 4.75 110.02 27 79 -52 2704 #4 2013-04-01 4.61 102.37 20 67 -47 2209 #5 2013-05-01 4.64 100.39 21 65 -44 1936 #6 2013-06-01 4.72 102.16 26 66 -40 1600
- The difference between the rankings of each binary observation is calculated as .
sum(df_spearman$d) #[1] 0
- Later, the squares of the difference are summed.
d_square_sum <- sum(df_spearman$d_square) d_square_sum #[1] 69107
Spearman rank correlation coefficient , is formulated as:
n <- nrow(df) rho_s <- (1-(6*(sum(d_square_sum)))/(n*(n^2-1))) %>% round(2) rho_s #[1] 0.3
This result shows us that there is a positive and weak relation between gasoline and brent prices. Let’s examine this result is at a significance level of %5 and find if the alternative hypothesis is true.
The point we have to look at is highlighted in the chart above for and n=84; because of the the null hypothesis () is rejected and at the %5 significance level, we can say that although it is weak there is a positive relation between gasoline and brent prices.
Let’s check the results with another way by calling the function ggscatter.
library("ggpubr") ggscatter(df, x = "brent", y = "gasoline", color = "blue", cor.coef = TRUE, cor.method = "spearman", xlab = "Brent (TL)", ylab = "Gasoline (TL)")
As we can see in the chart above, spearman’s ranked correlation coefficient (R=0.3) is the same we found before; and p-value (0.0055) less than 0.05 significance level which means the alternative hypothesis is true ().
Finally, we will examine the relation between gasoline and dollar (USD/TRY)
ggscatter(df, x = "dollar", y = "gasoline", color = "red", cor.coef = TRUE, cor.method = "spearman", xlab = "USD/TRY", ylab = "Gasoline (TL)")
The graphic above appears to have a strong positive relationship between gasoline and the dollar. P-value value less than 0.05 indicates that the result is significant and one more time null hypothesis is rejected.
References
- Sanjiv Jaggia, Alison Kelly (2013). Business Intelligence: Communicating with Numbers.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.