[This article was first published on
R Psychologist, and kindly contributed to
R-bloggers]. (You can report issue about the content on this page
here)
Want to share your content on R-bloggers?
click here if you have a blog, or
here if you don't.
A while ago I was playing around with the JavaScript package D3.js,
and I began with this visualization—that I never really finished—of how
a one-way ANOVA is calculated. I wanted to make the visualization
interactive, and I did integrate some interactive elements. For
instance, if you hover over a data point it will show the residual, and
its value will be highlighted in the combined computation. The circle
diagram show the partitioning of the sums of squares, and if you hover a
part it will show from where the variation is coming. I tried to make
the plots look like plots from the R-package ggplot2.
These plots are not designed to work on mobile phones.
Let’s check the calculations in R
To se if this works, let’s compute the ANOVA as I have described it
here.
1
2
3
4 | # data
grp1 <- c(1,2,3,4)
grp2 <- c(5,6,7,8)
grp3 <- c(9,10,11,12)
|
1
2
3 | # total SS
total_SS <- sum((c(grp1, grp2, grp3) - mean(c(grp1, grp2, grp3)))^2)
total_SS
|
1
2
3 | # within groups SS
within_SS <- sum((c(grp1 - mean(grp1), grp2 - mean(grp2), grp3 - mean(grp3)))^2)
within_SS
|
1
2
3 | # within groups SS
within_SS <- sum((c(grp1 - mean(grp1), grp2 - mean(grp2), grp3 - mean(grp3)))^2)
within_SS
|
1
2
3 | # between groups
between_SS <- 4*(sum((c(mean(grp1), mean(grp2), mean(grp3))^2 - mean(df$y)^2)))
between_SS
|
1
2
3
4 | # check calculation
between_SS + within_SS == total_SS
[1] TRUE
|
We see that total_SS, between_SS and within_SS are identical to
what is shown above in the visualization.
1
2
3
4 | df1 <- 3-1 # number of groups - 1
df2 <- 12 - 3 # N - number of groups
F <- (between_SS/df1) / (within_SS/df2)
F
|
1 | 1-pf(F, df1, df2) # p-value
|
Let's compare this to anova()
1
2
3 | df <- data.frame(y=c(grp1,grp2,grp3))
df$group <- gl(3,4)
anova(lm(y ~ group, df))
|
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
group 2 128 64.000 38.4 3.921e-05 ***
Residuals 9 15 1.667
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
We have identical results.
Related
Waseem Medhat is a Statistical Programmer and Computational Experimentalist who resides in Alexandria, Egypt This post takes a closer look at the forest plot that was mentioned in a previous post introducing PSI’s Wonderful Wednesdays events. It describes a custom version of a forest plot with additional bands to visualize…
January 14, 2021
In "R bloggers"
In the first part of A pick of the best R packages for interactive plot and visualization, we saw the best packages to do interactive plot in R. Now, let’s see what are the best packages for interactive visualizations. While plots tend are representing ‘classic’ data. These plots have an x-axis a…
July 6, 2017
In "R bloggers"
I am still definitely in the proof of concept stage, but as I progress I get more excited about the prospects of combining d3.js with R and Axys through Bryan Lewis’ really nice R websockets package (even nicer now that he has added the daemonize fun...
July 27, 2012
In "R bloggers"