Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
Many processes in chemistry, especially in synthesis, require attaining a certain target value for a property of interest. For example, when synthesizing drug capsules that contain a medicine, a chemist has to ensure that the concentration of the medicine meets a target value. If the concentration is too high or too low, then the patient ingesting the drug capsules could suffer catastrophic health problems. Thus, monitoring this attainment is a very important part of analytical chemistry.
Of course, natural variation in any chemical process will result in some variation in the output, so the target value will rarely be attained exactly. There is usually an acceptable range of values, but any deviation of the output beyond this acceptable range must be discovered and treated with alarm, as the underlying process for generating that output may be inherently faulty. The process should be stopped, examined, and repaired before any more output can be generated. From a statistical perspective, there needs to be some mechanism to monitor for outliers as the process unfolds.
A control chart is a useful tool for monitoring chemical processes to detect outliers. In this tutorial, I will
- explain the underlying concepts of a simple but common type of control charts
- demonstrate how to produce control charts with an example data set in R
Read the rest of this blog post to learn how to build the above control chart in R!
What is a Control Chart?
A control chart is a scatter plot that allows a chemist to monitor a process as it happens over time. It plots the quantity of interest on the vertical axis against time (or the order of the generation of the data) on the horizontal axis. There are many variations of control charts, but they generally show how far the data deviate from a target value. Here is one type of control charts from Harris (2003) that you can easily plot. There are 5 special horizontal lines in this type of control charts. Suppose that
- the target value of the property of interest is
, - the standard deviation*** for generating this quantity in the chemical process is
, - and the number of data collected is
.
(***Although the true values of the population mean and population standard deviation can never be accurately determined, many laboratories and companies have long histories of doing the same chemical process. Thus, they have very unbiased and precise estimates of the population mean and population standard deviation, and those estimates can be considered to be “true” for such practical purposes.)
Then, you can plot the data versus time, and add 5 special lines to this scatter plot. The values of these lines are in brackets below.
- the target value line (
) - the upper warning line (
) - the upper action line (
) - the lower warning line (
) - the lower action line (
)
A chemist can use these lines to determine if the data are deviating too far away from the target value. Significantly large deviations indicate that something is wrong with the data-generating process. Daniel Harris (2003) suggests stopping the process and examining for malfunctions if any of the following events occur.
a) 1 datum falls outside of the action lines
b) 2 out of 3 consecutive data fall between the warning lines and the action lines
c) 7 consecutive data are all above or all below the target value line
d) 6 consecutive data steadily increase or steadily decrease, regardless of their location
e) 14 consecutive data alternate up and down regardless of their location
f) some obvious non-random pattern
An Example of a Control Chart in R
Suppose that you are producing vitamin C capsules, and the target weight percentage of vitamin C in each capsule is 95%. Based on past experience in making these capsules, you know that the standard deviation of the weight percentage is 0.005. I have simulated 25 data for this production process, and you can find this data set at the end of this blog post. I have called the vector of this data set “vitamin_c”.
Notice my use of
- the abline() function to draw the 5 horizontal lines
- the axis() function to draw my custom labels for the 5 horizontal lines along the vertical axis
- the “yaxt = ‘n’” option in the plot() function to suppress the printing of the default vertical axis
Here is the R script for plotting a control chart for this production process according to the specifications as outlined in Harris (2003). Note that I have labeled the 5 horizontal lines using abbreviations:
- upper action = UA
- upper warning = UW
- target value = TV
- lower warning = LW
- lower action = LA
##### Plotting a Control Chart in Analytical Chemistry ##### By Eric Cai, The Chemical Statistician # first, import the data vector from the bottom of this blog post # assign it to the variable "vitamin_c" # obtain the number of data in this vector n = length(vitamin_c) # create a vector of the order of the production # this will be the horizontal axis in the control chart ordering = 1:n # the target weight percentage is 95% mu = 95 # from past experience, you know that the standard deviation is 0.005 # treat this as the "true" standard deviation sigma = 0.005 # set the 5 horizontal lines of the control chart upper_action_line = mu + 3*sigma/sqrt(n) upper_warning_line = mu + 2*sigma/sqrt(n) target_value = mu lower_warning_line = mu - 2*sigma/sqrt(n) lower_action_line = mu - 3*sigma/sqrt(n) # put all of the values of the 5 horizontal lines into 1 vector control_lines = c(upper_action_line, upper_warning_line, target_value, lower_warning_line, lower_action_line) # create a vector of labels for the 5 horizontal lines control_labels = c('UA', 'UW', 'TV', 'LW', 'LA') # export the control chart in PNG format to a folder of your choice png('Write Your Working Directory Path Here/control chart for vitamin c production.png') # note the use of the "yaxt = 'n'" option to suppress the default y-axis plot(ordering, vitamin_c, main = 'Control Chart for Vitamin C Production', xlab = 'Order of Production', ylab = 'Weight Percentage', yaxt = 'n', ylim = c(lower_action_line - sd, upper_action_line + sd)) # draw the 5 horizontal lines along the left vertical axis abline(h = control_lines) # write the labels for the 5 horizontal lines axis(2, at = control_lines, labels = control_labels) dev.off()
Here is the resulting control chart.
Notice that Data #12-18 (7 consecutive points) all fall below the target value line. Based on criterion c) above, that warrants shutting down the production process to examine for a possible malfunction.
Reference
Harris, Daniel C. “Quantitative analytical chemistry” (2003).
Data
Weight Percentage of Vitamin C |
94.9991591445192 |
95.0013843593435 |
94.9987445081374 |
95.0000701427664 |
95.0017114408727 |
94.9993970920185 |
94.9995278336148 |
94.9993646286875 |
94.9997142263651 |
95.0001381082248 |
95.0012276303438 |
94.9991982205454 |
94.9989196074000 |
94.9998424656439 |
94.9989282399601 |
94.9998610138594 |
94.9994026869053 |
94.9978160332399 |
95.0002408172559 |
94.9997406445933 |
95.0009005119453 |
95.0009418693939 |
95.0014679619034 |
95.0007067610896 |
95.0008190089303 |
Filed under: Analytical Chemistry, Applied Statistics, Chemistry, Data Analysis, Descriptive Statistics, Plots, Practical Applications of Chemistry, R programming, Statistics, Statistics in Industry and Practice, Tutorials Tagged: abline(), analytical chemistry, axis(), chemistry, control chart, lower action line, lower warning line, plot, quality assurance, quality control, R, R programming, statistical process control, target value, upper action line, upper warning line, vitamin c
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.