Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R
is a great language for statistical process control largely because there are several packages which make it quite easy, including qcc, SixSigma, and a new one that I want to explore in this post called ggQC. I’ve been using qcc
for a while and I trust its calculations, so I’ll use that to validate ggQC
.
ggQC
was created by Keneth Grey and I came accross it when I read his excelent article on why we use r-bar and a constant to calculate the control limits on a control chart. The advantage of ggQC
is that it extends ggplot2, so that you get all of the flexibility that comes with that.
First, let’s load the tidyverse
, qcc
, and ggQC
(you’ll need to install these first if you haven’t already).
library(tidyverse) library(qcc) library(ggQC)
Now, let’s create an x-bar chart with qcc to use as a baseline.
# Generate sample data set.seed(20190117) example_df <- data_frame(values = rnorm(n=30*5, mean = 25, sd = .005), subgroup = rep(1:30, 5), n = rep(1:5, each = 30)) %>% add_row(values = rnorm(n=2*5, mean = 25 + .006, sd = .005), subgroup = rep(31:32, 5), n = rep(1:5, each = 2)) ## Warning: `data_frame()` is deprecated, use `tibble()`. ## This warning is displayed once per session. # Spread the data and plot the control chart qcc_chart <- example_df %>% spread(key = n, value = values) %>% select(-subgroup) %>% qcc(type = "xbar", data.name = "Example Data")
qcc_chart
You can see that as long as the data is formatted with each group on its own row, this package makes it very easy to generate a funcitonal control chart. However, customizing this chart isn’t as easy for those of us who primarily use ggplot2
instead of base plots.
Let’s take a look at the same thing with ggQC. First we’ll need to put the data in long format, then we’ll pass that to ggplot2
and add some ggQC
layers.
ggQC_example <- example_df %>% ggplot(aes(x = subgroup, y = values, group = 1)) + stat_summary(fun.y = mean, geom = "point") + stat_summary(fun.y = mean, geom = "line") + stat_QC(method = "xBar.rBar", auto.label = TRUE, label.digits = 4) + scale_x_continuous(expand = expand_scale(mult = c(.05, .15))) + # Pad the x-axis for the labels ylab("x-bar") + theme_bw() ggQC_example
You can see that both packages compute the UCL and LCL identically.
I would like for the two stat summaries to be plotted as part of stat_QC
, but I like that you get more control over them when you plot them on their own and that you see exacly what you’re plotting. While building the chart this way seems a little tedious, it makes it easy to read what’s happening and also easy to add additional layers, like overlaying the individual points:
ggQC_example + geom_point(alpha = .25)
I like showing the plot like this because it is more clear to an innocent bystander exactly what is being plotted.
One thing that I would like for ggQC to provide is an easy way to make a decent chart in just a few lines of code. For example, something like this:
tidy_df %>% ggplot(aes(x = Groups, y = Observations)) + stat_quick_QC(method = "xBar.rBar", auto.label = TRUE, label.digits = 4)
Another thing missing from the ggQC package is coloring the violations. It is supposed to color them with the stat_qc_violations()
layer, but I think that is a messy solution with the four facets. It is also buggy, as you can see by the two violations that it identifies being plotted in the wrong place on the x axis.
ggQC_example + stat_qc_violations(method = "xBar.rBar")
So, let’s try a work-around to color the violations. First we’ll get the violations from the ggQC QC_Violations
function and then we’ll add them to the ggQC_example
plot, colored red.
violations <- example_df %>% QC_Violations(value = "values", grouping = "subgroup", method = "xBar.rBar") %>% filter(Violation == TRUE) %>% select(data, Index) %>% unique() ggQC_example + geom_point(data = violations, aes(x = Index, y = data), color = "red")
You can see that ggQC
doesn’t use the same violation rules as qcc
. ggQC
lists its rules in the help documentation for QC_Violations()
, but I don’t see the rules documented anywhere for the qcc
package. There are multiple sents of rules that people use, but I like to refer to the NIST Engineering Statistics Handbook Western Electric rules since they’re published by NIST and are publically available. However, neither package uses these rules exactly, so lets do the same work-around with the qcc package for fun.
violations <- data_frame(Observations = qcc_chart$violations) %>% map(unlist) %>% as_data_frame() %>% left_join(as_data_frame(qcc_chart$statistics) %>% mutate(Observations = row_number()), by = "Observations") ## Warning: `as_data_frame()` is deprecated, use `as_tibble()` (but mind the new semantics). ## This warning is displayed once per session. ggQC_example + geom_point(data = violations, aes(x = Observations, y = value), color = "red")
Overall I like the ggQC package. It’s still an early version and has some things to work out, but I think that it’s the natural progression of control charting in R
.
This post was generated with version 0.0.31 of ggQC
.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.