Design of Experiments with Mixtures and their Analysis with R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Using R for design and analysis of results for experiments with mixtures.
“Mixtures are absolutely everywhere you look. Anything you can combine is a mixture.”
(chem4kids.com)
All the code and data in this post are available in the repository: Design of Experiments with Mixtures and their Analysis with R
What are experimental designs with mixtures?
These are designs aimed at determining the effect of the proportion of different components of a mixture on one or more response variables.
We must emphasize that we are referring to the proportions of the different components in the mixture and not to their absolute amount. That is, it is the proportion that determines the effect.
This type of design has application in the formulation of many products such as beverages, foods, fuels, paints, etc.
In an experimental design of mixtures, the sum of the proportions of each component is equal to 1:
And the limits of the proportion of each component must be between 0 and 1:
In a practical problem, calculating the proportions of each component is straightforward. For example, suppose that the sum of three components in a soda equals 2.5 g, and the respective amounts of each component are 1, 1, and 0.5 g. The ratio for the first and second ingredient is 1/2.5 = 0.4, and the ratio for the third ingredient is 0.5/2.5 = 0.2. Thus 0.4 + 0.4 + 0.2 = 1.
An experimental design of mixtures will help us determine the proportions of each component to produce the best flavor or to reduce some undesirable physical property in the liquid, for example.
Types of mixture designs and their generation in R
In this post I will focus on two types of mix designs: simplex-lattice and simplex-centroid. The generation of these designs is simple with the mixexp package.
Simplex-lattice design
The simplex-lattice design considers q components and allows fitting a model of order m to the experimental data. To generate a design with 3 components of order 3 we use the function SLD()
as follows:
library(mixexp) # Make a simplex-lattice design simplex_lattice_desing <- SLD( fac = 3, # Number of components (factors) lev = 2 # Number of levels besides 0 ) # Display the design simplex_lattice_desing ## x1 x2 x3 ## 1 1.0 0.0 0.0 ## 2 0.5 0.5 0.0 ## 3 0.0 1.0 0.0 ## 4 0.5 0.0 0.5 ## 5 0.0 0.5 0.5 ## 6 0.0 0.0 1.0
It should be noted that it is not necessary to specify the levels (proportions) of each component, as these are automatically determined by the ratio:
If the proportions of each line are added together, the result will be equal to 1.
In addition, for three components it is possible to use the DesignPoints()
function to visualize the experimental region of the experiment:
DesignPoints(simplex_lattice_desing)
In this figure, the three vertices correspond to pure mixtures (formed by a single ingredient), the three sides or edges represent binary mixtures that have only two of the three components. The interior points of the triangle represent the ternary mixtures in which the three ingredients are different from zero.
Finally, the design can be exported to our working folder with the function write.csv()
:
write.csv( simplex_lattice_desing, file = "data/simplex_lattice_desing.csv", row.names = FALSE )
Simplex-centroid design
If predictions are to be made within the experimental region, it is important to include centroid points within the experimental region. The simplex-centroid design includes all intermediate mixtures between components. The SCD()
function is used to generate it:
simplex_centroid_design <- SCD( fac = 3 # Number of components (factors) ) # Export the design write.csv( simplex_centroid_design, file = "data/simplex_centroid_design.csv", row.names = FALSE ) # Display the design simplex_centroid_design ## x1 x2 x3 ## 1 1.0000000 0.0000000 0.0000000 ## 2 0.0000000 1.0000000 0.0000000 ## 3 0.0000000 0.0000000 1.0000000 ## 4 0.5000000 0.5000000 0.0000000 ## 5 0.5000000 0.0000000 0.5000000 ## 6 0.0000000 0.5000000 0.5000000 ## 7 0.3333333 0.3333333 0.3333333
By visualizing the experimental region with three components, it becomes much clearer what we mean by intermediate mixtures:
DesignPoints(simplex_centroid_design)
Mixing designs with component constraints
It is normal that due to technical or economic constraints, for example, the proportion of one or more components is restricted to a shape limit:
It is possible to generate designs considering the constraints for each component with the Xvert()
function:
simplex_restrictions <- Xvert( nfac = 3, # Number of components (maximum 12) uc = c(0.612, 0.765, 0.109), # Upper constrains lc = c(0.153, 0.306, 0.053), # Lower constrains plot = FALSE, # Automatic plot when TRUE and nfac = 3 ndm = 1 # Order of centroids included ) # Export the design write.csv( simplex_restrictions, file = "data/simplex_restrictions.csv", row.names = FALSE ) # Display the design simplex_restrictions ## x1 x2 x3 dimen ## 1 0.6120000 0.3060000 0.08200002 0 ## 2 0.1530000 0.7649999 0.08200002 0 ## 3 0.6120000 0.3350000 0.05300000 0 ## 4 0.1820000 0.7649999 0.05300000 0 ## 5 0.5850000 0.3060000 0.10899998 0 ## 6 0.1530000 0.7380000 0.10899998 0 ## 7 0.1530000 0.7515000 0.09550000 1 ## 8 0.6120000 0.3205000 0.06750001 1 ## 9 0.5985000 0.3060000 0.09550000 1 ## 10 0.1675000 0.7649999 0.06750001 1 ## 11 0.3970000 0.5500000 0.05300000 1 ## 12 0.3690000 0.5220000 0.10899998 1 ## 13 0.3828333 0.5358333 0.08133333 2
It is also possible to visualize the experimental subregion:
DesignPoints(simplex_restrictions[, 1:3])
Analysis of the results of a mix design
For the example analysis, I will use the data published in Performance of reduced fat-reduced salt fermented sausage with added microcrystalline cellulose, resistant starch and oat fiber using the simplex design. In this study, the effect of the proportion of three ingredients on different characteristics of fermented sausages was determined. In this case I will only focus the analysis on one of the response variables (hardness).
Data import
As usual, the first step in the analysis is to import our data into R:
data_sausage <- read.csv("data/data_sausage.csv") # Display the data data_sausage ## MCC RS OF Hardness ## 1 1.00 0.00 0.00 52.49 ## 2 0.00 1.00 0.00 48.17 ## 3 0.00 0.00 1.00 64.02 ## 4 0.50 0.50 0.00 41.82 ## 5 0.50 0.00 0.50 39.65 ## 6 0.00 0.50 0.50 56.34 ## 7 0.33 0.33 0.34 42.26 ## 8 0.33 0.33 0.34 42.80 ## 9 0.66 0.17 0.17 50.79 ## 10 0.17 0.66 0.17 50.48 ## 11 0.17 0.17 0.66 48.48
Model adjustment
You can use the lm()
function to adjust a complete model or, for the same purpose, the MixModel()
function of the mixexp
package:
# with lm() function # -1 indicates to skip the intercept hardness_model_lm <- lm( Hardness ~ -1 + MCC + RS + OF + MCC:RS + RS:OF + MCC:OF + MCC:RS:OF, # Model data = data_sausage ) # with MixModel() function hardness_model_mm <- MixModel( frame = data_sausage, # Data response = "Hardness", # Response name mixcomps = c("MCC", "RS", "OF"), # Factor names model = 4 # Model to be fit, type ?MixModel )
Note that these models did not include the mean or intercept. Due to the restriction of the sum of components is equal to 1, the parameters in the model are not unique. Basically, the model without mean eliminates the problem of dependence between the coefficients. As we will see later, the interpretation of each coefficient and the hypothesis testing related to each must be done in a special way for this type of design.
Model coefficients, their interpretation and determination coefficients
The summary()
function will display a complete report with the coefficients of the model we previously selected, as well as the coefficients of determination:
summary(hardness_model_lm) ## ## Call: ## lm(formula = Hardness ~ -1 + MCC + RS + OF + MCC:RS + RS:OF + ## MCC:OF + MCC:RS:OF, data = data_sausage) ## ## Residuals: ## 1 2 3 4 5 6 7 8 9 10 ## -1.8143 -0.6241 1.0879 -2.5404 -0.7340 0.4923 -2.4654 -1.9254 7.0439 3.3634 ## 11 ## -1.8839 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## MCC 54.30432 4.49807 12.073 0.000270 *** ## RS 48.79405 4.49807 10.848 0.000410 *** ## OF 62.93214 4.49856 13.989 0.000151 *** ## MCC:RS -28.75529 22.58741 -1.273 0.271951 ## RS:OF -0.06174 22.59683 -0.003 0.997951 ## MCC:OF -72.93694 22.59683 -3.228 0.032044 * ## MCC:RS:OF 16.95888 131.40193 0.129 0.903539 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.647 on 4 degrees of freedom ## Multiple R-squared: 0.9968, Adjusted R-squared: 0.9911 ## F-statistic: 176.5 on 7 and 4 DF, p-value: 8.155e-05
In general, the coefficients in this type of model are interpreted as follows:
- Coefficients of individual components. They do not measure the overall effect of component xi, but only estimate the value of the response at the vertex of the simplex. If these coefficients are not significant, it does not mean that the effect of the individual component is not important, so hypothesis tests on them are usually ignored.
- Coefficients of double interactions. If the sign of this coefficient is positive, there is synergy between the components; if it is negative, there is antagonism between them.
- Triple interaction coefficients. Quantifies the effect of the ternary mixture within the simplex.
The result report can be easily saved with the capture.output()
function:
capture.output( summary(hardness_model_lm), file = "data/sum_hardness_model.txt" )
step() function to improve the model
In order to improve the coefficients of determination or simplify the model, sometimes non-significant terms are eliminated. This can be done somewhat subjectively by trial and error by eliminating one or more terms and then comparing with the full model. R also offers a systematic way to do the above using the step()
function, this function uses the Akaike information criterion iteratively to simplify and/or improve the coefficients of determination of the model.
The function can display a large number of results depending on the number of iterations it makes to simplify the model, so in this example I will directly save the results in a text file:
capture.output( step(hardness_model_lm), file = "data/step_results.txt" )
Subsequently, we only need to adjust the simplified model:
hardness_model_simp <- lm( Hardness ~ -1 + MCC + RS + OF + MCC:RS + MCC:OF, data = data_sausage )
By displaying a summary of results it can be seen that there is not a big difference between the coefficients of determination of this model and the full model. However, the smaller number of terms may have some practical advantage depending on the problem:
# Export summary capture.output( summary(hardness_model_simp), file = "data/sum_hardness_simplified_model.txt" ) summary(hardness_model_simp) ## ## Call: ## lm(formula = Hardness ~ -1 + MCC + RS + OF + MCC:RS + MCC:OF, ## data = data_sausage) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.8704 -1.7706 -1.0711 0.7132 7.0898 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## MCC 54.228 3.655 14.836 5.90e-06 *** ## RS 48.862 3.275 14.921 5.70e-06 *** ## OF 63.002 3.275 19.240 1.28e-06 *** ## MCC:RS -27.419 16.567 -1.655 0.14899 ## MCC:OF -71.575 16.506 -4.336 0.00489 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.804 on 6 degrees of freedom ## Multiple R-squared: 0.9968, Adjusted R-squared: 0.9941 ## F-statistic: 368.8 on 5 and 6 DF, p-value: 2.231e-07
Lack-of-fit test
Another way to evaluate the quality of the model fit, if there is more than one repetition for any of the treatments, is by means of a lack-of-fit test. This can be done directly with the pureErrorAnova()
function of the alr3
package:
library(alr3) lof_complete_hardness_model <- pureErrorAnova(hardness_model_lm) # Export results write.csv( lof_complete_hardness_model, file = "data/lof_complete_hardness_model.csv", na = "" ) # Display ANOVA lof_complete_hardness_model ## Analysis of Variance Table ## ## Response: Hardness ## Df Sum Sq Mean Sq F value Pr(>F) ## MCC 1 13323.1 13323.1 91379.4260 0.002106 ** ## RS 1 7231.4 7231.4 49597.9877 0.002859 ** ## OF 1 5804.0 5804.0 39807.9118 0.003191 ** ## MCC:RS 1 47.9 47.9 328.4924 0.035090 * ## RS:OF 1 0.1 0.1 0.9393 0.509959 ## MCC:OF 1 272.0 272.0 1865.5601 0.014737 * ## MCC:RS:OF 1 0.4 0.4 2.4666 0.360954 ## Residuals 4 86.4 21.6 ## Lack of fit 3 86.2 28.7 197.1118 0.052300 . ## Pure Error 1 0.1 0.1 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For the simplified model:
lof_simp_hardness_model <- pureErrorAnova(hardness_model_simp) # Export results write.csv( lof_simp_hardness_model, file = "data/lof_simp_hardness_model.csv", na = "" ) # Display ANOVA lof_simp_hardness_model ## Analysis of Variance Table ## ## Response: Hardness ## Df Sum Sq Mean Sq F value Pr(>F) ## MCC 1 13323.1 13323.1 91379.43 0.002106 ** ## RS 1 7231.4 7231.4 49597.99 0.002859 ** ## OF 1 5804.0 5804.0 39807.91 0.003191 ** ## MCC:RS 1 47.9 47.9 328.49 0.035090 * ## MCC:OF 1 272.1 272.1 1865.93 0.014735 * ## Residuals 6 86.8 14.5 ## Lack of fit 5 86.7 17.3 118.87 0.069517 . ## Pure Error 1 0.1 0.1 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In this test, if the p-value obtained for Lack of fit is greater than 0.05, or at the significance level established by the experimenter, it can be concluded that the model fits the data adequately. Note how with the full model we came close to rejecting the hypothesis of lack of fit, while with the simplified model the situation improved somewhat.
Visualization of the simplified model in two dimensions
It is possible to make a contour plot with the fitted model, only for the case of three components in the mixture:
## For simplified model ModelPlot( hardness_model_simp, # Our model dimensions = list(x1 = "MCC", x2 = "RS", x3 = "OF"), # Variable names contour = TRUE, # Add contour lines fill = TRUE, # Add color axislabs = c("MCC", "RS", "OF"), # Axis labels cornerlabs = c("MCC", "RS", "OF") # Corner labels )
The ModelPlot()
function is also included in the mixexp
package.
The graph can be exported in png format, for example, as follows:
png("graphs/contour_hardnees_model.png", units = "cm", width = 15, height = 10, res = 150) ModelPlot( hardness_model_simp, # Our model dimensions = list(x1 = "MCC", x2 = "RS", x3 = "OF"), # Variable names contour = TRUE, # Add contour lines fill = TRUE, # Add color axislabs = c("MCC", "RS", "OF"), # Axis labels cornerlabs = c("MCC", "RS", "OF") # Corner labels ) dev.off()
Mixture effect plot
Another way to plot the results is by using an effect plot for the components of the mixture. This two-dimensional plot can be useful if you have more than three components in the mixture. To do this we can use the ModelEff()
function included in mixexp
:
## For complete model ModelEff( nfac = 3, # Number of components (factors) mod = 4, # Type of model, type ?MixModel for more information ufunc = hardness_model_mm # Model fitted by MixModel() function )
ModelEff()
displays the components in the same order as specified in the fitted model, so x1 corresponds to MCC, x2 corresponds to RS and x3 corresponds to OF. This plot starts with a reference mixture (usually the center of the experimental region) and shows how the response changes as one of the components increases or decreases in the mixture; when one of the components changes, the rest increase or decrease proportionally. The disadvantage of ModelEff()
is that only complete, not simplified, models can be used to make the plot.
The effect graph can be exported in the same way as the contour graph:
png("graphs/effects_hardnees_model.png", units = "cm", width = 15, height = 10, res = 150) ModelEff( nfac = 3, # Number of components (factors) mod = 4, # Type of model, type ?MixModel for more information ufunc = hardness_model_mm # Model fitted by MixModel() function ) dev.off()
If the reader would like to consult more examples of analysis with the mixexp
package, please check the document at the following link: Mixture Experiments in R Using mixexp.
Very good! That’s all for this post, thank you very much for visiting this blog.
Juan Pablo Carreón Hidalgo 🤓
[email protected]
https://github.com/jpch26
This work is licensed under a Creative Commons Attribution 4.0 International License.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.