Site icon R-bloggers

A web interface for regression analysis: Walkthrough

[This article was first published on Antoine's data science views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
After the quick overview, here is a quick walkthrough to some categorical analysis.

Open the app: Here

1. Import the data:

Here are some homemade data, done with the following R code:
set.seed(3467) x=1:400+rnorm(400,0,1) y1=x*2.5+40+rnorm(400,0,50) y2=x*4.5+80+rnorm(400,0,50) group=rep(c('G1','G2'),each=400) x=c(x,x) y=c(y1,y2) DF=data.frame(x=x,y=y,group=group) write.csv(DF,'DF.csv')
Click on import data, select your data and set rownames to first column. You should then get a quick overview of the data:

2. Let’s take a closer looks to our data:

Go to Data->View Data: and choose x, y and group as the variable to display. We can see that we have two groups (Group1, Group2). Lets take a closer look to x and y distribution


Now clic on View boxplot:




Here is the distribution of our datas, there doesn’t seems to be gap in any of these, let’s do some regression !

3. Rename our variables:

x and y aren’t very explicit variable name, let’s rename them as input and response.
Go to Data->Data engineering, and select y as the variable to modify, select rename as the operation to apply and Response as the name. Create the new var !
Do the same with x.

Go to Data->View Data:



4.Run a first model

Clic on the model tab, and run the following model: Response~input by selecting Response as the variable to predict and input as the predictor. Run the model!



5.Model Summary

Go to summary, as we can see, our model is an okay model and is significant. Hovewer, it seems like we’re missing some pattern. Let’s take a look at the plot:

Well, it looks like the two groups have a really different line and we should have ignore interaction.

6.Interaction model

Go back to the model tab, add Group in the input and set the interaction between group and the response.


You can check the summary again, our model performs far better, furthermore, looking at the graph:


That’s better, and our two groups are significantly different.

7.Outliers and assumption

Since we created the data, we shouldn’t have issues with the regression assumption.
Let’s go to diagnostic->normality. As expected our residuals ar normally distributed.



Let’s go to diagnostic->outliers:
On the summary tab, for each observations, Cook’s D, internally studentised residuals and hat’s value are computed. Observations 28, 389, 407,436 and 789 are outliers, let’s delete them and rerun the model. (You can also take a look to the other outliers tab to have a visualisation of the different outlyingness measures).

8.Save and compare model

Go back to the model tab and save the model as model1.
Rerun a model without the interaction between Group and the input and save it as model 2.
You can run a Lack-of-Fit analysis on the Model Comparison tab:


Using the different criterion, it seems that the interaction model is better (lower AIC and BIC), conclusive F-test, which is what w would have expected given the way we created the data.

Thanks for following this quick walkthrough and I hope you’ll like the app !

Antoine

To leave a comment for the author, please follow the link and comment on their blog: Antoine's data science views.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.