Site icon R-bloggers

Producing a Control Chart in R – An Application in Analytical Chemistry

[This article was first published on The Chemical Statistician » R programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Many processes in chemistry, especially in synthesis, require attaining a certain target value for a property of interest.  For example, when synthesizing drug capsules that contain a medicine, a chemist has to ensure that the concentration of the medicine meets a target value.  If the concentration is too high or too low, then the patient ingesting the drug capsules could suffer catastrophic health problems.  Thus, monitoring this attainment is a very important part of analytical chemistry.

Of course, natural variation in any chemical process will result in some variation in the output, so the target value will rarely be attained exactly.  There is usually an acceptable range of values, but any deviation of the output beyond this acceptable range must be discovered and treated with alarm, as the underlying process for generating that output may be inherently faulty.  The process should be stopped, examined, and repaired before any more output can be generated.  From a statistical perspective, there needs to be some mechanism to monitor for outliers as the process unfolds.

A control chart is a useful tool for monitoring chemical processes to detect outliers.  In this tutorial, I will

Read the rest of this blog post to learn how to build the above control chart in R!

What is a Control Chart?

A control chart is a scatter plot that allows a chemist to monitor a process as it happens over time.  It plots the quantity of interest on the vertical axis against time (or the order of the generation of the data) on the horizontal axis.  There are many variations of control charts, but they generally show how far the data deviate from a target value.  Here is one type of control charts from Harris (2003) that you can easily plot.  There are 5 special horizontal lines in this type of control charts.  Suppose that

(***Although the true values of the population mean and population standard deviation can never be accurately determined, many laboratories and companies have long histories of doing the same chemical process.  Thus, they have very unbiased and precise estimates of the population mean and population standard deviation, and those estimates can be considered to be “true” for such practical purposes.)

Then, you can plot the data versus time, and add 5 special lines to this scatter plot.  The values of these lines are in brackets below.

  1. the target value line ()
  2. the upper warning line ()
  3. the upper action line ()
  4. the lower warning line ()
  5. the lower action line ()

A chemist can use these lines to determine if the data are deviating too far away from the target value.  Significantly large deviations indicate that something is wrong with the data-generating process.  Daniel Harris (2003) suggests stopping the process and examining for malfunctions if any of the following events occur.

a) 1 datum falls outside of the action lines

b) 2 out of 3 consecutive data fall between the warning lines and the action lines

c) 7 consecutive data are all above or all below the target value line

d) 6 consecutive data steadily increase or steadily decrease, regardless of their location

e) 14 consecutive data alternate up and down regardless of their location

f) some obvious non-random pattern

An Example of a Control Chart in R

Suppose that you are producing vitamin C capsules, and the target weight percentage of vitamin C in each capsule is 95%.  Based on past experience in making these capsules, you know that the standard deviation of the weight percentage is 0.005.  I have simulated 25 data for this production process, and you can find this data set at the end of this blog post.  I have called the vector of this data set “vitamin_c”.

Notice my use of

Here is the R script for plotting a control chart for this production process according to the specifications as outlined in Harris (2003).  Note that I have labeled the 5 horizontal lines using abbreviations:

##### Plotting a Control Chart in Analytical Chemistry
##### By Eric Cai, The Chemical Statistician

# first, import the data vector from the bottom of this blog post
# assign it to the variable "vitamin_c"

# obtain the number of data in this vector
n = length(vitamin_c)

# create a vector of the order of the production
# this will be the horizontal axis in the control chart
ordering = 1:n

# the target weight percentage is 95%
mu = 95

# from past experience, you know that the standard deviation is 0.005
# treat this as the "true" standard deviation
sigma = 0.005

# set the 5 horizontal lines of the control chart
upper_action_line = mu + 3*sigma/sqrt(n)
upper_warning_line = mu + 2*sigma/sqrt(n)
target_value = mu
lower_warning_line = mu - 2*sigma/sqrt(n)
lower_action_line = mu - 3*sigma/sqrt(n)

# put all of the values of the 5 horizontal lines into 1 vector
control_lines = c(upper_action_line, upper_warning_line, target_value, lower_warning_line, lower_action_line)

# create a vector of labels for the 5 horizontal lines
control_labels = c('UA', 'UW', 'TV', 'LW', 'LA')

# export the control chart in PNG format to a folder of your choice
png('Write Your Working Directory Path Here/control chart for vitamin c production.png')

# note the use of the "yaxt = 'n'" option to suppress the default y-axis
plot(ordering, vitamin_c, main = 'Control Chart for Vitamin C Production',
 xlab = 'Order of Production', ylab = 'Weight Percentage', yaxt = 'n', 
 ylim = c(lower_action_line - sd, upper_action_line + sd))

# draw the 5 horizontal lines along the left vertical axis
abline(h = control_lines)

# write the labels for the 5 horizontal lines
axis(2, at = control_lines, labels = control_labels)
dev.off()

Here is the resulting control chart.

Notice that Data #12-18 (7 consecutive points) all fall below the target value line.  Based on criterion c) above, that warrants shutting down the production process to examine for a possible malfunction.

Reference

Harris, Daniel C. “Quantitative analytical chemistry” (2003).

Data

Weight Percentage of Vitamin C
94.9991591445192
95.0013843593435
94.9987445081374
95.0000701427664
95.0017114408727
94.9993970920185
94.9995278336148
94.9993646286875
94.9997142263651
95.0001381082248
95.0012276303438
94.9991982205454
94.9989196074000
94.9998424656439
94.9989282399601
94.9998610138594
94.9994026869053
94.9978160332399
95.0002408172559
94.9997406445933
95.0009005119453
95.0009418693939
95.0014679619034
95.0007067610896
95.0008190089303

Filed under: Analytical Chemistry, Applied Statistics, Chemistry, Data Analysis, Descriptive Statistics, Plots, Practical Applications of Chemistry, R programming, Statistics, Statistics in Industry and Practice, Tutorials Tagged: abline(), analytical chemistry, axis(), chemistry, control chart, lower action line, lower warning line, plot, quality assurance, quality control, R, R programming, statistical process control, target value, upper action line, upper warning line, vitamin c

To leave a comment for the author, please follow the link and comment on their blog: The Chemical Statistician » R programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.