Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Data must be selected carefully. The predictive usefulness of the model is grossly diminished if outliers taint the available data. Figure 1, for instance, shows the Defense spending (as a fraction of the national budget) between 1948 and 1968.
One way we can judge the usefulness a little more scientifically is to run a regression on the differences. In other words, Plot Y(t+1) on top of Y(t), and insert the regression line. Check it out:
Next Time: Saving the code, changing the method.
References:
Code adapted from http://www.courtneybrown.com/classes/ModelingSocialPhenomena/Assignments/Assignment2NationalDefenseOutlaysCourtneyBrownMathModeling.htm
Dataset taken from http://www.courtneybrown.com/classes/ModelingSocialPhenomena/Assignments/Assignment2CourtneyBrownMathModeling.htm
Appendix:
This is the R code to generate these Graphs:
df <- read.csv(file="http://nortalktoowise.com/code/datasets/Defense.csv", head=TRUE, sep=",") attach(df) lagvar <- function(x,y){return(c(rep(NA, y),x[-((length(x)-y+1):length(x))]))} lagvar1 <- lagvar(defensespending,1) model <- lm(defensespending ~ lagvar1) y2 <- 0 t <- 0 y1 <- .3 a <- model$coefficients[[2]] b <- model$coefficients[[1]] timeserieslength <- nrow(df) for (i in 1:timeserieslength) { y2[i] <- (a*y1[i])+b t[i] <- i if (i < timeserieslength) y1[i+1]=y2[i]} plot(t, defensespending, xlab="Years", ylab="Defense Spending as Fraction of Budget", main="Figure 1: U.S. Defense Expenditure, 1948-68", pch=19) lines(t, y2, lwd=2) windows() plot(lagvar1, defensespending, xlab="", ylab="", pch=19) title(xlab="Y", ylab="Y(t+1)", main="Figure 2: Plot of the first differences", cex=1.5, col="black", =2) abline(model, lwd=2) windows() newdf <- df[which(year>1951, arr.in=TRUE),] lagvar2 <- lagvar(newdf$defensespending,1) model2 <- lm(newdf$defensespending ~ lagvar2) y2 <- 0 t <- 0 y1 <- .68 a <- model2$coefficients[[2]] b <- model2$coefficients[[1]] timeserieslength <- nrow(newdf) for (i in 1:timeserieslength) { y2[i] <- (a*y1[i])+b t[i] <- i if (i < timeserieslength) y1[i+1]=y2[i]} plot(newdf$year, newdf$defensespending, xlab="Years", ylab="Defense Spending as Fraction of Budget", main="Figure 3: U.S. Defense Expenditure, 1952-68", pch=19) lines(newdf$year, y2, lwd=2)
Please forgive my poor style (reusing variable names and whatnot). It works, and that’s enough for me at the moment.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.