Some Basic Concepts about Design of Experiments and How to Perform their Analysis in R

Posted on January 15, 2022 by R in the Lab in R bloggers | 0 Comments

[This article was first published on R in the Lab, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Basic design of experiments in R for one factor and two factors designs.

You can find all the code, data and results in the GitHub repository for this post: Basic design of experiments.

There is no signal without noise

It never hurts to go back to basics before tackling more complex things. The purpose of this post is to give a brief overview of the basics of design of experiments, their analysis and how to present results using R and packages like ggplot2 and agricolae. Included are one- and two-factor experiments.

What is the design of experiments?

The design of experiments (DOE) deals with the planning and performance of tests with the objective of generating data. Statistical analysis of these data will provide objective evidence that will allow the researcher to resolve questions about a given situation, process or phenomenon.

DOE can be applied to scientific research and in industry. Its goal is to generate knowledge and learning in an efficient manner. Ideally, this process can be considered as a cycle where initial hypotheses are refined as more information is obtained. Thus, we start with an initial hypothesis that we test through experimentation. If the data (our evidence) does not agree with the hypothesis, a new hypothesis is sought to explain the observed discrepancy.

Inductive-deductive loop – Pamela Toman

Hypothesis?

Hypotheses are tentative explanations of the research phenomenon that are stated as propositions or assertions. There are different types of hypotheses, such as:

Research hypotheses. Tentative propositions about possible relationships between two or more variables.
Null hypothesis. Propositions that deny or refute the relationship between variables.
Alternative hypotheses. These are different or “alternate” possibilities to the research and null hypotheses.

Null hypothesis

Data and measurements

A datum is a number that is the product of a measurement. A measurement is a process that links abstract concepts with empirical indicators. Of course, measurements could not be made without a measurement instrument, which can be defined as the resource used by the researcher to record information or data.

Measuring absorbance

Other basic definitions in the design of experiments

Experiment

It is a change in the operating conditions of a system or process, which is made with the objective of measuring the effect of the change on one or more properties of the product or result.

Schematic view of experiment design

Controlled experiment

Experimental unit

Piece(s) or sample(s) used to generate a value that is representative of the test result.

Experimental unit

Response variable (Depedent variable)

Through these variable(s) we know the effect or results of each experimental test. They are the product of our measurements once we make changes in the system.

Controllable factors (Independent variable)

They are process variables that can be modified and fixed at a given level.

Response and independent variables

Non-controllable factors

These are variables that cannot be controlled during the experiment or the normal operation of the process.

Factors studied

These are the variables that are investigated in the experiment to observe how they affect or influence the response variables.

Levels and treatments

The different values assigned to each factor studied in an experimental design are called levels. A combination of levels of all the factors studied is called a treatment or design point. In the case of experimenting with a single factor, each level is a treatment.

Design matrix

It is the arrangement formed by the treatments that will be carried out, including the repetitions. In this same array are usually included the results of each response variable for each treatment and repetition. For me this is a concept of great importance, since from this matrix it is possible to perform directly the different statistical tests in R.

Design matrix

Random error

It is the observed variability that cannot be explained by the factors studied. It is the result of the effect of factors not studied and experimental error.

Experimental error

A component of random error that reflects the experimenter’s errors in the planning and execution of the experiment.

Basic principles of experimental design

Randomization. Performing the experimental runs (treatments) in a randomized manner is essential to ensure, as far as possible, the assumption of independence of errors.
Repetition. It is running a treatment or combination of factors more than once. This makes it possible to estimate random error and in turn calculate realistic statistics in data analysis.
Blocking. This consists of nullifying the effect of factors in which we are not really interested and which may affect the response variables. Not taking into account their effect can lead us to erroneous conclusions about the factors in which we are interested.

Block example: Male and females

Some concepts of statistical inference

Population. In a very general way, it can be defined as the totality of possible individuals, specimens, objects or measurements of interest on which a study is made. Populations can be finite or infinite.
Parameters. Characteristics that, by means of their numerical value, describe the entire population.
Representative sample. An appropriately selected part of a population that retains key aspects of a population.
Statistic. A quantity obtained from the data of a sample that summarizes the characteristics of the sample.
Statistical inference. These are statistical statements about the population or process based on the information contained in the sample.
Probability distribution. It relates the set of values of a characteristic in a given population to the probability associated with each of these values.

Statistical inference

Probability distribution

Some hypothesis testing concepts

Statistical hypothesis. An assertion about the values of the parameters of a population or process, which can be tested from the information in a sample.
Test statistic. Formula with which a number is calculated from the data, the magnitude of which makes it possible to determine whether or not the null hypothesis is rejected.
Acceptance region. These are the possible values of the test statistic where the null hypothesis is not rejected.
Rejection region. It is the set of possible values of the test statistic that lead to rejecting the null hypothesis.
Type I error. It is when a null hypothesis that is true is rejected. It can also be called false positive.
Type II error. This is when a null hypothesis that is false is accepted. It is basically a false negative.
Power of the test. It is the probability of rejecting the null hypothesis when it is false.
Predefined significance. It is the maximum risk that the experimenter is willing to take with respect to the type I error. It is denoted by α.
Observed significance. It is the area under the reference distribution beyond the value of the test statistic. It is called p-value.

Hypothesis testing

Experiments with a single factor

When the objective is to study the effect of a single factor on the mean of a response variable taking into account more than two values of the factor under consideration, it is advisable to perform a completely randomized experiment and analyze the results by analysis of variance (ANOVA). The reason for comparing means by ANOVA and not by Student’s t-tests is that the latter test increases false positives as the number of means to be compared increases, i.e., we will find differences between means where there are none.

The analysis of variance consists of separating the total variation observed in each of the sources that contribute to it.

Design and results

Obtaining a single factor design is very simple using R commands. Let’s say we want to compare the effect of three baking times (35, 45 and 60 min) in the average diameter of cookie batches. For each time we baked ten cookie batches so we have ten replicates. Note that each time and replicate have to be run in aleatory order:

set.seed(6)

b_time <- rep(c(35, 45, 60), each = 10)
b_time <- sample(b_time)

run_order <- 1:30

one_fct_dgn <- data.frame(
  run_order,
  b_time
)

head(one_fct_dgn)
##   run_order b_time
## 1         1     60
## 2         2     35
## 3         3     45
## 4         4     45
## 5         5     60
## 6         6     35

Subsequently, the design can be exported in some other format such as CSV:

write.csv(one_fct_dgn, "data/one_fct_dgn.csv", row.names = FALSE)

The results can be integrated into the design matrix directly, let’s simulate the average diameter with a small function:

av_diam <- function(diameter = 45) {
  SD <- 2.5
  if (diameter == 35) rnorm(1, 15, sd = SD)
  else if (diameter == 45) rnorm(1, 20, sd = SD)
  else if (diameter == 60) rnorm(1, 23, sd = SD)
}

set.seed(3)
diameter <- mapply(av_diam, one_fct_dgn$b_time)

one_fct_dgn$diameter <- diameter

head(one_fct_dgn)
##   run_order b_time diameter
## 1         1     60 20.59517
## 2         2     35 14.26869
## 3         3     45 20.64697
## 4         4     45 17.11967
## 5         5     60 23.48946
## 6         6     35 15.07531

Or, once the results have been recorded externally, they can be imported from a CSV file:

exp_one_factor <- read.csv("data/exp_one_factor.csv", sep = ";")
head(exp_one_factor)
##   factor resp
## 1      1  264
## 2      2  208
## 3      3  220
## 4      4  217
## 5      1  260
## 6      2  220

Analysis

The aov function is used for the analysis of factorial designs. Previously, it is recommended to convert the factor values into the factor class and then perform the analysis. Here I going to analyze our first design (one_fct_dgn):

one_fct_dgn$b_time <- as.factor(one_fct_dgn$b_time)

one_fct_aov <- aov(diameter ~ b_time, data = one_fct_dgn)

Note the aov requires a formula that explicitly specifies the relationship between the response and the factor(s) in the design. Once this has been done, it is possible to obtain an ANOVA table using the “anova” function:

one_fct_anova <- anova(one_fct_aov)
one_fct_anova
## Analysis of Variance Table
## 
## Response: diameter
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## b_time     2 367.51 183.755  44.902 2.587e-09 ***
## Residuals 27 110.49   4.092                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

And export it in the format of your choice:

write.csv(one_fct_anova, "results/anova_one_fct.csv", na = "", row.names = FALSE)

Multiple range comparisons or tests

Once significant differences are established between at least two of the means, the next step is to perform a multiple comparison test, which will allow us to define which means are significantly different. There are several methods that can be used as the LSD method, Tukey’s method, or Duncan’s method. Nice functions for these methods can be found in agricolae package.

LSD

library(agricolae)

one_fct_lsd <- LSD.test(one_fct_aov, "b_time")
one_fct_lsd
## $statistics
##   MSerror Df     Mean       CV  t.value      LSD
##   4.09237 27 18.75029 10.78896 2.051831 1.856282
## 
## $parameters
##         test p.ajusted name.t ntr alpha
##   Fisher-LSD      none b_time   3  0.05
## 
## $means
##    diameter      std  r      LCL      UCL      Min      Max      Q25      Q50
## 35 14.46991 1.566410 10 13.15732 15.78250 12.17195 17.79153 13.48176 14.24977
## 45 18.73777 2.048728 10 17.42518 20.05036 15.83381 22.90154 17.27960 18.38145
## 60 23.04320 2.371958 10 21.73061 24.35579 19.95286 26.16842 20.74760 23.43479
##         Q75
## 35 15.17899
## 45 19.73752
## 60 25.02251
## 
## $comparison
## NULL
## 
## $groups
##    diameter groups
## 60 23.04320      a
## 45 18.73777      b
## 35 14.46991      c
## 
## attr(,"class")
## [1] "group"

Tukey HSD

one_fct_hsd <- HSD.test(one_fct_aov, "b_time")
one_fct_hsd
## $statistics
##   MSerror Df     Mean       CV      MSD
##   4.09237 27 18.75029 10.78896 2.243118
## 
## $parameters
##    test name.t ntr StudentizedRange alpha
##   Tukey b_time   3         3.506426  0.05
## 
## $means
##    diameter      std  r      Min      Max      Q25      Q50      Q75
## 35 14.46991 1.566410 10 12.17195 17.79153 13.48176 14.24977 15.17899
## 45 18.73777 2.048728 10 15.83381 22.90154 17.27960 18.38145 19.73752
## 60 23.04320 2.371958 10 19.95286 26.16842 20.74760 23.43479 25.02251
## 
## $comparison
## NULL
## 
## $groups
##    diameter groups
## 60 23.04320      a
## 45 18.73777      b
## 35 14.46991      c
## 
## attr(,"class")
## [1] "group"

Duncan

one_fct_duncan <- duncan.test(one_fct_aov, "b_time")
one_fct_duncan
## $statistics
##   MSerror Df     Mean       CV
##   4.09237 27 18.75029 10.78896
## 
## $parameters
##     test name.t ntr alpha
##   Duncan b_time   3  0.05
## 
## $duncan
##      Table CriticalRange
## 2 2.901727      1.856282
## 3 3.048662      1.950279
## 
## $means
##    diameter      std  r      Min      Max      Q25      Q50      Q75
## 35 14.46991 1.566410 10 12.17195 17.79153 13.48176 14.24977 15.17899
## 45 18.73777 2.048728 10 15.83381 22.90154 17.27960 18.38145 19.73752
## 60 23.04320 2.371958 10 19.95286 26.16842 20.74760 23.43479 25.02251
## 
## $comparison
## NULL
## 
## $groups
##    diameter groups
## 60 23.04320      a
## 45 18.73777      b
## 35 14.46991      c
## 
## attr(,"class")
## [1] "group"

The results are consistent among the three methods. The section groups indicates which means are different, different letters mean significant differences. You can export the results of this kind of methods, as a TXT file, with the function capture.output. For example, to export the results of the Tukey method:

capture.output(one_fct_hsd, file = "results/hsd_one_fct.txt")

Verification of model assumptions

Something that is not very widespread in the scientific literature is the verification of the normality of the residuals, the equality between variances for the means of each treatment and the independence between each data recorded. Verification of these assumptions helps to validate the differences established as significant.

Shapiro-Wilks test to verify normality

This test is easy to perform:

one_fct_shapiro <- shapiro.test(one_fct_aov$residuals)
one_fct_shapiro
## 
##  Shapiro-Wilk normality test
## 
## data:  one_fct_aov$residuals
## W = 0.96561, p-value = 0.4271

Here, if p-value > 0.05, we can say that data come, approximately, from a normal distribution.

The results of this analysis can be exported with the capture.output function:

capture.output(one_fct_shapiro, file = "results/shapiro_one_fct.txt")

Bartlett’s test for equality of variances

To perform this test it is possible to use the bartlett.test function:

one_fct_bt <- bartlett.test(diameter ~ b_time, data = one_fct_dgn)
one_fct_bt
## 
##  Bartlett test of homogeneity of variances
## 
## data:  diameter by b_time
## Bartlett's K-squared = 1.4403, df = 2, p-value = 0.4867

A p-value > 0.05 mean that treatment means have equal variances. We can export the results in the same way:

capture.output(one_fct_bt, file = "results/bartlett_one_fct.txt")

Independence

To verify independence between observations, residuals are plotted as a function of run order. If the residuals follow a defined pattern it will be a clear indication of lack of independence. For this plot I used the ggplot2 package:

library(ggplot2)

one_fct_dgn$residuals <- one_fct_aov$residuals

ggplot(one_fct_dgn, aes(run_order, residuals)) +
  geom_point() +
  theme_classic() +
  xlab("Run order") +
  ylab("Residuals")

Tests such as the Durbin-Watson test are also available to verify this assumption. Use the function durbinWatsonTest in the car package:

library(car)

one_fct_dw <- durbinWatsonTest(one_fct_aov)
one_fct_dw
##  lag Autocorrelation D-W Statistic p-value
##    1      -0.1264564      2.176091   0.658
##  Alternative hypothesis: rho != 0

A p-value > 0.05 means that data observations are not auto correlated. You can also save the results in the usual way:

capture.output(one_fct_dw, file = "results/durbin_watson_one_fct.txt")

Display of results

Bar graph

To visualize the results of this type of experiments, bar graphs are usually used, whose height represents the magnitude of each mean, together with error bars representing the standard deviation and letters showing the significant differences established by the multiple comparisons test:

library(dplyr)

# Means, standard deviations and groups from Tukey HSD
one_fct_means <- one_fct_dgn %>% 
  group_by(b_time) %>% 
  summarise(
    av_diam = mean(diameter),
    sd_diam = sd(diameter)
  ) %>% 
  ungroup() %>% 
  arrange(desc(av_diam)) %>% 
  mutate(group = unlist(one_fct_hsd$groups["groups"]))

# Bar plot
ggplot(one_fct_means, aes(b_time, av_diam)) +
  geom_col(width = 0.5) +
  geom_errorbar(
    aes(ymin = av_diam - sd_diam, ymax = av_diam + sd_diam), width = 0.1
    ) +
  geom_text(aes(label = group, y = av_diam + sd_diam), vjust = -0.5) +
  xlab("Baking time (min)") +
  ylab("Average diameter") +
  theme_classic()

Alternative

Although bar graphs are widely used in the scientific literature of various types, their use hides the true dispersion of the data for each treatment. One thing I have also noticed is that people tend to judge differences between means by the overlap or separation of standard deviations, which seems to me to be more of a bias than an objective criterion.

As an alternative to the bar chart, it is possible to make a graph showing each observation, the mean, and the letters obtained in the multiple comparisons test:

ggplot(one_fct_dgn, aes(b_time, diameter)) +
  geom_point() +
  stat_summary(fun = mean, geom = "crossbar", width = 0.2, color = "blue") +
  geom_text(
    data = one_fct_means, 
    aes(label = group, y = av_diam), hjust = -3
  ) +
  xlab("Baking time (min)") +
  ylab("Diameter") +
  theme_classic()

Two-factor designs

In this type of design the objective is to verify the individual and interaction effect of two factors. Before showing how to obtain and analyze them, it is necessary to recall some concepts:

Qualitative factor. Its levels take discrete or nominal values.
Quantitative factor. Its levels can take any value within a certain interval. The scale is continuous.
Effect of a factor. It is the change observed in the response variable due to a change in the level of the factor.
Interaction effect. Two factors interact significantly on the response variable when the effect of one depends on the level of the other.

Design and results

To obtain this kind of designs we can use the function fac.design from DoE.base package. For this we need specify the number of factors and the number of levels of each factor:

library(DoE.base)

two_fct_dgn <- fac.design(
  nlevels = 3, nfactors = 2, replications = 3, seed = 4
) 

head(two_fct_dgn)
##   A B Blocks
## 1 2 3     .1
## 2 3 1     .1
## 3 3 3     .1
## 4 1 3     .1
## 5 1 2     .1
## 6 3 2     .1

For more details about this function, just type ?fac.design in your console. Also note that there is an extra column with the name “Blocks”, this is not big deal since there is just one and we won’t taking it into account further in the analysis. In future posts I will address block designs.

This design can also be exported in the desired format:

write.csv(two_fct_dgn, file = "data/two_fct_dgn.csv", row.names = FALSE)

The results can also be integrated directly or imported once recorded in the design matrix. Here, I’m going to import and work with data previously recorded:

exp_two_factor <- read.csv("data/exp_two_factor.csv", sep = ";")
head(exp_two_factor)
##   run_order factor1 factor2 res
## 1         1    0.15     0.2  74
## 2         2    0.15     0.2  64
## 3         3    0.15     0.2  60
## 4         4    0.18     0.2  79
## 5         5    0.18     0.2  68
## 6         6    0.18     0.2  73

Analysis

To analyze this type of design, the “aov” function is used, specifying each main effect and the interaction effect in the formula:

exp_two_factor$factor1 <- as.factor(exp_two_factor$factor1)
exp_two_factor$factor2 <- as.factor(exp_two_factor$factor2)

two_fct_aov <- aov(
  res ~ factor1 + factor2 + factor1:factor2, data = exp_two_factor
)

You can also use the short form Y ~ A*B, which will be displayed in full once you get the ANOVA table:

two_fct_aov <- aov(res ~ factor1*factor2, data = exp_two_factor)
two_fct_anova <- anova(two_fct_aov)
two_fct_anova
## Analysis of Variance Table
## 
## Response: res
##                 Df  Sum Sq Mean Sq F value    Pr(>F)    
## factor1          3 2125.11  708.37 24.6628 1.652e-07 ***
## factor2          2 3160.50 1580.25 55.0184 1.086e-09 ***
## factor1:factor2  6  557.06   92.84  3.2324   0.01797 *  
## Residuals       24  689.33   28.72                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
write.csv(two_fct_anova, file = "results/anova_two_fct.csv", na = "")

Note that previously I also converted the values of each factor in the factor class.

Multiple range comparisons tests

Once a significant interaction effect is established, the next step is to establish which means differ. For this, a Tukey test can be used, for which the interaction effect must first be explicitly specified in a separated aov model:

two_fct_aov_2 <- aov(
  res ~ interaction(factor1, factor2), data = exp_two_factor
)

two_fct_hsd <- HSD.test(two_fct_aov_2, "interaction(factor1, factor2)")
two_fct_hsd
## $statistics
##    MSerror Df     Mean       CV      MSD
##   28.72222 24 94.33333 5.681249 15.77773
## 
## $parameters
##    test                        name.t ntr StudentizedRange alpha
##   Tukey interaction(factor1, factor2)  12          5.09913  0.05
## 
## $means
##                 res      std r Min Max   Q25 Q50   Q75
## 0.15.0.2   66.00000 7.211103 3  60  74  62.0  64  69.0
## 0.15.0.25  88.66667 3.055050 3  86  92  87.0  88  90.0
## 0.15.0.3   99.66667 2.081666 3  98 102  98.5  99 100.5
## 0.18.0.2   73.33333 5.507571 3  68  79  70.5  73  76.0
## 0.18.0.25  96.66667 8.082904 3  88 104  93.0  98 101.0
## 0.18.0.3   99.33333 4.509250 3  95 104  97.0  99 101.5
## 0.21.0.2   87.33333 5.033223 3  82  92  85.0  88  90.0
## 0.21.0.25 100.66667 6.658328 3  95 108  97.0  99 103.5
## 0.21.0.3  105.66667 5.859465 3  99 110 103.5 108 109.0
## 0.24.0.2   99.66667 4.041452 3  96 104  97.5  99 101.5
## 0.24.0.25 104.33333 5.507571 3  99 110 101.5 104 107.0
## 0.24.0.3  110.66667 3.511885 3 107 114 109.0 111 112.5
## 
## $comparison
## NULL
## 
## $groups
##                 res groups
## 0.24.0.3  110.66667      a
## 0.21.0.3  105.66667      a
## 0.24.0.25 104.33333     ab
## 0.21.0.25 100.66667    abc
## 0.15.0.3   99.66667    abc
## 0.24.0.2   99.66667    abc
## 0.18.0.3   99.33333    abc
## 0.18.0.25  96.66667    abc
## 0.15.0.25  88.66667    bcd
## 0.21.0.2   87.33333     cd
## 0.18.0.2   73.33333     de
## 0.15.0.2   66.00000      e
## 
## attr(,"class")
## [1] "group"

The results of Tukey’s test can be exported directly:

capture.output(two_fct_hsd, file = "results/hsd_two_fct.txt")

Verification of model assumptions

For the verification of the assumptions it is possible to proceed in a manner similar to that of the single-factor designs.

Shapiro-Wilks test to verify normality

two_fct_shapiro <- shapiro.test(two_fct_aov$residuals)
two_fct_shapiro
## 
##  Shapiro-Wilk normality test
## 
## data:  two_fct_aov$residuals
## W = 0.97051, p-value = 0.4397
capture.output(two_fct_shapiro, file = "results/shapiro_two_fct.txt")

Bartlett’s test for equality of variances

For Bartlett test I previously made an extra column specifying the combination between factors (treatments). This will indicate the groups to bartlett.test function:

exp_two_factor$treatment <- mapply(
 paste, exp_two_factor$factor1, exp_two_factor$factor2
)

two_fct_bartlett <- bartlett.test(res ~ treatment, data = exp_two_factor)
two_fct_bartlett
## 
##  Bartlett test of homogeneity of variances
## 
## data:  res by treatment
## Bartlett's K-squared = 4.6743, df = 11, p-value = 0.9459
capture.output(two_fct_bartlett, file = "results/bartlett_two_fct.txt")

Independence

exp_two_factor$residuals <- two_fct_aov$residuals

ggplot(exp_two_factor, aes(run_order, residuals)) +
  geom_point() +
  theme_classic() +
  xlab("Run order") +
  ylab("Residuals")

two_fct_dw <- durbinWatsonTest(two_fct_aov)
two_fct_dw
##  lag Autocorrelation D-W Statistic p-value
##    1      -0.3920052      2.671663   0.828
##  Alternative hypothesis: rho != 0
capture.output(one_fct_dw, file = "results/durbin_watson_two_fct.txt")

Display of results

Bar graph

The realization of this type of graphs requires keeping in mind the individual means of each treatment and the interaction effects, if there is some. To calculate them together with the standard deviations we used the “group_by” function and then “summarise” from the “dplyr” package:

# Means, standard deviations and groups from Tukey HSD
two_fct_means <- exp_two_factor %>% 
  group_by(factor1, factor2) %>% 
  summarise(
    av_res = mean(res),
    sd_res = sd(res)
  ) %>% 
  ungroup() %>% 
  arrange(desc(av_res)) %>% 
  mutate(group = unlist(two_fct_hsd$groups["groups"]))

# Bar plot
ggplot(two_fct_means, aes(factor1, av_res, fill = factor2)) +
  geom_col(width = 0.5, position = position_dodge()) +
  geom_errorbar(
    aes(ymin = av_res - sd_res, ymax = av_res + sd_res), width = 0.1,
    position = position_dodge(width = 0.5)
    ) +
  geom_text(
    aes(label = group, y = av_res + sd_res), vjust = -0.5, 
    position = position_dodge(width = 0.5)
  ) +
  labs(fill = "Factor 2") +
  xlab("Factor 1") +
  ylab("Average response") +
  theme_classic()

Alternative

To make the graph showing each observation, the mean, a confidence interval and the letters obtained in the multiple comparisons test, it is necessary to use the data.frame with the observations next to the one with the means:

ggplot(exp_two_factor, aes(factor1, res, group = factor2)) +
  geom_point(aes(color = factor2), position = position_dodge(width = 0.9)) +
  stat_summary(
    fun = mean, geom = "crossbar", width = 0.2, color = "blue",
    position = position_dodge(width = 0.9)
  ) +
  geom_text(
    data = two_fct_means, 
    aes(label = group, y = av_res), 
    vjust = -4, size = 3.5, position = position_dodge(width = 0.9)
  ) +
  ylim(c(55, 130)) +
  xlab("Factor 1") +
  ylab("Response") +
  labs(color = "Factor 2") +
  theme_classic()

## Interaction plot

A good way to visualize the interaction effect is through interaction.plot function:

Factor1 <- exp_two_factor$factor1
Factor2 <- exp_two_factor$factor2
Response <- exp_two_factor$res

interaction.plot(Factor1, Factor2, Response)

That’s it! Thank you very much for visiting this site, I hope you find the content of this post useful. See you soon!

Juan Pablo Carreón Hidalgo 🤓

[email protected]
https://twitter.com/JuanPa2601

The text and code on this tutorial is under Creative Commons Attribution 4.0 International License.

To leave a comment for the author, please follow the link and comment on their blog: R in the Lab.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

What is the design of experiments?

Hypothesis?

Data and measurements

Other basic definitions in the design of experiments

Experiment

Experimental unit

Response variable (Depedent variable)

Controllable factors (Independent variable)

Non-controllable factors

Factors studied

Levels and treatments

Design matrix

Random error

Experimental error

Basic principles of experimental design

Some concepts of statistical inference

Some hypothesis testing concepts

Experiments with a single factor

Design and results

Analysis

Multiple range comparisons or tests

LSD

Tukey HSD

Duncan

Verification of model assumptions

Shapiro-Wilks test to verify normality

Bartlett’s test for equality of variances

Independence

Display of results

Bar graph

Alternative

Two-factor designs

Design and results

Analysis

Multiple range comparisons tests

Verification of model assumptions

Shapiro-Wilks test to verify normality

Bartlett’s test for equality of variances

Independence

Display of results

Bar graph

Alternative

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)