Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A basic R tutorial for carrying out the analysis of results of response surface designs. It also discusses how to generate Box-Behnken and Central Composite designs.
Any model is only an approximation
All code and data used in this post are available on GitHub: Response Surface Designs and their Analysis with R
What is a response surface design?
Response surface experimental designs and the analysis of their results are oriented to determine the optimal combination of factors that allow us to obtain the best response within the experimental region. As best response we can refer to a maximum or a minimum depending on our objectives.
It is recommended that response surface designs be made once we have previously determined which factors have a significant effect on the response. This can be done by means of a fractional design.
Types of response surface designs
First-order designs
These are designs that only consider the main (linear) effects of each factor. They can be used in the initial stages, when looking for the experimental region where the curvature is found and possibly a maximum or a minimum.
In this equation Y corresponds to the response, β0 is the global mean or intercept, βi is the coefficient obtained by linear regression for a given factor xi and ε corresponds to the random error. In the summation, k corresponds to the number of factors considered by the experimenter.
Second-order designs
Designs that allow estimation of linear, interaction and quadratic or curvature effects.
Among the designs that allow fitting this type of polynomial to the experimental data are the Box-Behnken and the Central Composite designs.
Generation of surface designs with the rsm package
Generating this type of design is quite straightforward with the rsm
package. If you want to see the full functionality of this package click on the following link: rsm package. There is also a document with several examples at the following link: examples with rsm.
Box-Behnken Design
To generate a Box-Behnken design we use the bbd()
function:
library(rsm) ## Warning: package 'rsm' was built under R version 4.2.1 bb_design_1 <- bbd( k = 4, # Number of factors, n0 = 3, # Number of center points, block = FALSE, # Consider blocks or not in the design randomize = FALSE # Randomize or not the order experiments ) head(as.data.frame(bb_design_1)) ## run.order std.order x1 x2 x3 x4 ## 1 1 1 -1 -1 0 0 ## 2 2 2 1 -1 0 0 ## 3 3 3 -1 1 0 0 ## 4 4 4 1 1 0 0 ## 5 5 5 0 0 -1 -1 ## 6 6 6 0 0 1 -1
You can also export your design in CSV format:
readr::write_csv(bb_design_1, file = "data/bbd_1.csv")
In the previous design each factor level are coded in the standard way. You can also consider the names and units of each factor:
bb_design_2 <- bbd( k = 4, # Four factors n0 = 3, # Three central points block = FALSE, # No blocks randomize = FALSE, # Not randomized coding = list( x1 ~ (Temperature - 50) / 25, x2 ~ (Power - 210)/ 30, x3 ~ (time - 30) / 15, x4 ~ (Ethanol - 80) / 20 ) ) head(bb_design_2) ## run.order std.order Temperature Power time Ethanol ## 1 1 1 25 180 30 80 ## 2 2 2 75 180 30 80 ## 3 3 3 25 240 30 80 ## 4 4 4 75 240 30 80 ## 5 5 5 50 210 15 60 ## 6 6 6 50 210 45 60 ## ## Data are stored in coded form using these coding formulas ... ## x1 ~ (Temperature - 50)/25 ## x2 ~ (Power - 210)/30 ## x3 ~ (time - 30)/15 ## x4 ~ (Ethanol - 80)/20
In this example, each factor level can be expressed in standard coding (-1, 0 and +1) with the general formula CodedLevel = (Level – CenterPoint) / StepSize. For example, for Temperature the level 25 corresponds to -1 since CodedLevel = (25 – 50) / 25 = -1. The previous convention is commonly used in textbooks dealing with response surfaces, and it is important if you want to determine the steepest-ascent path, which consist in moving the experimental region searching for the optimal combination of factors. For the analysis example below, I will assume that the experimental region is located near, or contains, the experimental optimum.
Central composite design
Like the Box-Behnken design, the central composite design will not allow us to estimate main, interaction and quadratic effects for each factor. To generate it we use the ccd()
function:
cc_design_1 <- ccd( basis = 3, # Number of factors n0 = 3, # Number of central points randomize = FALSE # Not randomized ) readr::write_csv(cc_design_1, file = "data/ccd_1.csv") # Export as CSV head(as.data.frame(cc_design_1)) ## run.order std.order x1 x2 x3 Block ## 1 1 1 -1 -1 -1 1 ## 2 2 2 1 -1 -1 1 ## 3 3 3 -1 1 -1 1 ## 4 4 4 1 1 -1 1 ## 5 5 5 -1 -1 1 1 ## 6 6 6 1 -1 1 1
By default ccd()
creates two blocks, one for regular experimental runs and another block for “start” points. This can be advantageous, since we can first make sure that there are curvature effects with the first block, and then we can estimate the quadratic terms of the model with the “star” points of the second block.
Analysis of results of a response surface design with the rsm package
For this example, I used the data reported in the academic paper Effects of Ultrasonic-Assisted Extraction on the Yield and the Antioxidative Potential of Bergenia emeiensis Triperpenes. This study sought the best combination of the factors shown in the table below to obtain the highest yield of triterpenes from dried roots (rhizomes) of B. emeiensis.
The table shows the codes that I used for each factor, as well as for the levels (letters in square brackets) of each factor (the numbers -1, 0, +1).
If you are interested, the data for this example can be found in the “data” folder in the GitHub repository of this post.
Import data
The first step is to import your data into R. In this case the experimental data are saved in CSV format:
yield_data <- readr::read_csv("data/yield_data.csv") head(yield_data) ## # A tibble: 6 × 5 ## A B C D Yield ## <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 180 70 30 30 224. ## 2 210 80 30 30 227. ## 3 210 90 30 20 172. ## 4 180 80 15 30 209. ## 5 210 80 15 40 201. ## 6 240 70 30 30 221.
Just in case, I’m going to code factor levels using the coded.data()
function. This will facilitate the use of coded data in response-surface analysis:
yield_data <- coded.data( yield_data, # Experimental data formulas = list( # List of coding formulas for each factor x1 ~ (A - 210)/30, x2 ~ (B - 80)/10, x3 ~ (C - 30)/15, x4 ~ (D - 30)/10 )) head(yield_data) ## A B C D Yield ## 1 180 70 30 30 223.7 ## 2 210 80 30 30 227.2 ## 3 210 90 30 20 171.9 ## 4 180 80 15 30 208.8 ## 5 210 80 15 40 201.3 ## 6 240 70 30 30 220.7 ## ## Data are stored in coded form using these coding formulas ... ## x1 ~ (A - 210)/30 ## x2 ~ (B - 80)/10 ## x3 ~ (C - 30)/15 ## x4 ~ (D - 30)/10
Model adjustment
To fit the second order model to our experimental data we use the rsm()
function:
yield_model <- rsm(Yield ~ SO(x1, x2, x3, x4), data = yield_data)
As you can see, to fit this model it is necessary to use the special function SO()
(SO from Second Order), which specify to rsm()
adjust a complete second order model with our data.
Summary of model results
The summary()
function will display a complete summary of the model we fit, including the coefficients, t-tests for each coefficient, coefficients of determination, lack of fit test, stationary (optimal) point, among other useful information:
summary(yield_model) ## ## Call: ## rsm(formula = Yield ~ SO(x1, x2, x3, x4), data = yield_data) ## ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 220.8800 4.6340 47.6652 < 2.2e-16 *** ## x1 -1.4917 2.9912 -0.4987 0.625748 ## x2 -27.0333 2.9912 -9.0375 3.222e-07 *** ## x3 10.0333 2.9912 3.3543 0.004724 ** ## x4 -6.0917 2.9912 -2.0365 0.061073 . ## x1:x2 5.8500 5.1810 1.1291 0.277818 ## x1:x3 2.3250 5.1810 0.4488 0.660474 ## x1:x4 -10.3000 5.1810 -1.9880 0.066725 . ## x2:x3 0.9500 5.1810 0.1834 0.857142 ## x2:x4 -5.4500 5.1810 -1.0519 0.310651 ## x3:x4 -8.0250 5.1810 -1.5489 0.143701 ## x1^2 -5.1275 4.0685 -1.2603 0.228167 ## x2^2 -24.0150 4.0685 -5.9026 3.850e-05 *** ## x3^2 -9.9650 4.0685 -2.4493 0.028082 * ## x4^2 -11.9525 4.0685 -2.9378 0.010803 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Multiple R-squared: 0.9124, Adjusted R-squared: 0.8248 ## F-statistic: 10.42 on 14 and 14 DF, p-value: 4.216e-05 ## ## Analysis of Variance Table ## ## Response: Yield ## Df Sum Sq Mean Sq F value Pr(>F) ## FO(x1, x2, x3, x4) 4 10449.6 2612.41 24.3310 3.542e-06 ## TWI(x1, x2, x3, x4) 6 962.9 160.48 1.4947 0.2501645 ## PQ(x1, x2, x3, x4) 4 4244.2 1061.04 9.8822 0.0005164 ## Residuals 14 1503.2 107.37 ## Lack of fit 10 957.9 95.79 0.7028 0.7042543 ## Pure error 4 545.2 136.31 ## ## Stationary point of response surface: ## x1 x2 x3 x4 ## 0.2484136 -0.4624262 0.7095327 -0.4946289 ## ## Stationary point in original units: ## A B C D ## 217.45241 75.37574 40.64299 25.05371 ## ## Eigenanalysis: ## eigen() decomposition ## $values ## [1] -0.6545221 -9.4572076 -16.1561933 -24.7920770 ## ## $vectors ## [,1] [,2] [,3] [,4] ## x1 0.7790649 0.50504706 0.3573793 -0.10131842 ## x2 0.1636823 0.02705505 -0.1174668 0.97912088 ## x3 0.3255063 -0.82446880 0.4623083 0.02382985 ## x4 -0.5102074 0.25386352 0.8029649 0.17461103
The above summary can be easily exported in TXT format with the capture.output()
function:
capture.output( summary(yield_model), # Object to be exported file = "data/model_summary.txt" # File name )
To evaluate the quality of the model, or rather how well the model fits the experimental data, we usually refer to the adjusted coefficient of determination and to the statistical test of lack of fit whose criteria can be the following:
- Adjusted coefficient of determination. Values close to 1 are preferred.
- Lack of fit test. A p-value greater than the significance level established by the experimenter will allow us to conclude that the model is “adequate”.
Both criteria will give us the certainty that we will make good predictions of the response with the fitted model in the experimental region. That is, we can have more confidence in the optimal response predicted by the model.
Graphical representation of the model
Representing the model in pseudo-3D or with contour plots is simple when there are a response and one or two factors, but with more than two factors we face the problem of having to plot more than three dimensions. In this example, this can be approximated by keeping two factors fixed at one of its levels and plotting the response as a function of the remaining two factors. The contour()
function can be used to make contour plots following the previous reasoning:
par(mfrow = c(2,3)) # 2 x 3 pictures on one plot contour( yield_model, # Our model ~ x1 + x2 + x3 + x4, # A formula to obtain the 6 possible graphs image = TRUE, # If image = TRUE, apply color to each contour )
If you are using RStudio you can easily export any type of graph or figure, but it is also possible to do the above directly with the following commands, in this case save the above graphs in jpeg format:
jpeg( filename = "graphs/model_contours.jpeg", width = 30, height = 20, units = "cm", res = 400, quality = 100 ) par(mfrow = c(2,3)) # 2 x 3 pictures on one plot contour( yield_model, # Our model ~ x1 + x2 + x3 + x4, # A formula to obtain the 6 possible graphs image = TRUE, # If image = TRUE, apply color to each contour ) dev.off()
It is also possible to make “3D” graphs with the persp()
function:
par(mfrow = c(2,3)) # 2 x 3 pictures on one plot persp( yield_model, # Our model ~ x1 + x2 + x3 + x4, # A formula to obtain the 6 possible graphs col = topo.colors(100), # Color palette contours = "colors" # Include contours with the same color palette )
This graphic can also be exported in jpeg format:
jpeg( filename = "graphs/model_3D.jpeg", width = 30, height = 20, units = "cm", res = 400, quality = 100 ) par(mfrow = c(2,3)) # 2 x 3 pictures on one plot persp( yield_model, # Our model ~ x1 + x2 + x3 + x4, # A formula to obtain the 6 possible graphs col = topo.colors(100), # Color palette contours = "colors" # Include contours with the same color palette ) dev.off()
Both graphs are stored in the graphs folder in the repository of this post. For more examples and customization options, please click on the following link: Surface Plots in the rsm Package.
Optimal experimental point
The optimal experimental point, the combination of factors that will result in the best response, can be found in the summary of our model:
opt_point <- summary(yield_model)$canonical$xs opt_point ## x1 x2 x3 x4 ## 0.2484136 -0.4624262 0.7095327 -0.4946289
That in the units of each factor corresponds to:
op_point_ru <- code2val( opt_point, # Optimal point in coded units codings = codings(yield_data) # Formulas to convert to factor units ) op_point_ru ## A B C D ## 217.45241 75.37574 40.64299 25.05371
The optimum point with coded units can be used directly to predict the best response, for this we use the predict()
function:
opt_point_df <- data.frame( # predict() needs a data frame with the points x1 = opt_point[1], # to be predicted x2 = opt_point[2], x3 = opt_point[3], x4 = opt_point[4] ) best_response <- predict( yield_model, # Our model opt_point_df # Data frame with points to be predicted ) names(best_response) <- "Best yield" # A nice name to our best point best_response ## Best yield ## 232.0112
Finally, as the good experimenters that we are, we will only have to evaluate the optimal combination of factors to make sure that the experimental result in the response is close to the one predicted by the model.
Great, that’s it for this post. I hope you find it useful. Thank you very much for visiting this blog!
Juan Pablo Carreón Hidalgo 🤓
jpch_26@outlook.com
https://github.com/jpch26
This work is licensed under a Creative Commons Attribution 4.0 International License.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.