My Own R Function and Script for Simple Linear Regression – An Illustration with Exponential Decay of DDT in Trout
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Here is the function that I wrote for doing simple linear regression, as alluded to in my blog post about simple linear regression on log-transformed data on the decay of DDT concentration in trout in Lake Michigan. My goal was to replicate the 4 columns of the output from applying summary() to the output of lm().
To use this file and this script,
1) I saved this file as “simple linear regression.r”.
2) In the same folder, I saved a script called “DDT trout regression.r” that used this function to implement simple linear regression on the log-transformed DDT data.
3) I changed the working directory to this folder containing the function and the script.
4) I made sure “DDT trout regression.r” used the source() function to call my user-defined function for simple linear regression.
5) I ran “DDT trout regression.r”.
##### A function for simple linear regression ##### Written by Eric Cai - The Chemical Statistician simple.linear.regression = function(predictor, target) { sample.size = length(target) mean.predictor = mean(predictor) mean.target = mean(target) sxx = sum((predictor - mean.predictor)^2) sxy = sum((predictor - mean.predictor)*(target - mean.target)) beta1.hat = sxy/sxx beta0.hat = mean.target - beta1.hat*mean.predictor target.fitted = beta0.hat + beta1.hat*predictor target.residuals = target - target.fitted sum.squared.residuals = sum(target.residuals^2) degree.freedom = sample.size - 2 mean.squared.error = sum.squared.residuals/degree.freedom standard.error = sqrt(mean.squared.error) beta0.hat.sample.variance = mean.squared.error*(1/sample.size + mean.predictor^2/sxx) beta0.hat.standard.error = sqrt(beta0.hat.sample.variance) beta1.hat.sample.variance = mean.squared.error/sxx beta1.hat.standard.error = sqrt(beta1.hat.sample.variance) t.beta0.hat = beta0.hat/beta0.hat.standard.error t.beta1.hat = beta1.hat/beta1.hat.standard.error p.value.beta0.hat = 2*pt(abs(t.beta0.hat), degree.freedom, lower.tail = F) p.value.beta1.hat = 2*pt(abs(t.beta1.hat), degree.freedom, lower.tail = F) # aggregate inferential results into a table, similar to output of summary(lm(y~x)) beta0.inference = c(beta0.hat, beta0.hat.standard.error, t.beta0.hat, p.value.beta0.hat) beta1.inference = c(beta1.hat, beta1.hat.standard.error, t.beta1.hat, p.value.beta1.hat) # combine inferential results of beta0 and beta1 into a matrix regression.inference = rbind(beta0.inference, beta1.inference) # provide names to columns and rows for table of inferential results colnames(regression.inference) = c('Estimated Coefficient', 'Standard Error', 't-statistic', 'p-value') rownames(regression.inference) = c('beta0', 'beta1') # display table regression.inference }
Here is “DDT in trout regression.r”. In this script, I entered the data and conducted simple linear regression.
##### Linear Regression of DDT Decay in Michigan Trout ##### Data from "Elements of Environmental Chemistry" by Ronald A. Hites, Page 49 ##### Script written by Eric Cai - The Chemical Statistician # clear all variables in the workspace rm(list=ls(all=TRUE)) # call my user-defined function, 'simple linear regression.r' source('simple linear regression.r') year = c(1970:1982, seq(1984, 1992, by=2), 1995, 1998) DDT = c(19.19, 13.00, 11.31, 9.96, 8.42, 7.50, 5.65, 6.34, 4.58, 6.91, 4.74, 3.22, 2.74, 2.22, 1.10, 1.44, 1.39, 1.16, 0.98, 0.85) DDT.data = cbind(year, DDT) colnames(DDT.data) = c('Time (Year)', 'DDT Concentration') DDT.data # let's plot the untransformed DDT concentration data with respect to time png("INSERT YOUR DIRECTORY HERE/DDT in trout.png") plot(year, DDT, xlab = 'Year', ylab = 'DDT Concentration in Trout (ppm)', main = 'DDT Concentration in Trout in Lake Michigan') dev.off() # let's conduct simple linear regression on the untransformed data DDT.regression = simple.linear.regression(year, DDT) DDT.regression # now transform the data using natural logarithm to linearize the DDT-time relationship ln.DDT = log(DDT) # let's plot the log-transformed DDT concentration data with respect to time png("INSERT YOUR DIRECTORY HERE/ln(DDT) in trout.png") plot(year, ln.DDT, xlab = 'Year', ylab = 'Natural Logarithm of DDT Concentration in Trout in Lake', main = 'Natural Logarithm of DDT Concentration in Trout in Lake Michigan') dev.off() # conduct simple linear regression on the log-transformed data ln.DDT.regression = simple.linear.regression(year, ln.DDT) ln.DDT.regression
Filed under: Applied Statistics, Environmental Chemistry, Plots, R programming Tagged: data analysis, linear regression, R, R function, regression, simple linear regression
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.