Gosset part 2: small sample statistics

[This article was first published on Category R on Roel's R-tefacts, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A NICE ONELINER HERE?

This post is an explainer about the small sample experiment and determining the ideal sample size for inference.

Economic perspectives and business logic

Brewing beer at scale

One of the problems William S. Gosset worked on was determining the quality of Malt. To brew beer you need 3 ingredients, yeast, hops and a cereal grain. You start with extracting the starch from the grain into water. You then flavoring the resulting sweet liquid with hops and fermenting that mixture with yeast.

Gosset worked on all three ingredients, but in this post I will look into the cereal grain.

Beer, malts and beer technique

Now, beer brewing has been done since the 6th century BCE, and all the steps have their own specialised names1 , which are different in all languages. So I will be talking about malt, but remember that it is just a source of starch, a source of sugars (I’m sorry chemists/ biologist, I have to simplify this a bit).

Grain, Barley, transforming into malt, as seen here. Malt and grain, detail of individual sprouted malt

The source of starch in beer is barley, this grain is first dried, cleaned and then wetted. This starts the biological process of germination in the grain (it gets ready to start a new barley plant). In that process some enzymes are made that will break down starch to smaller sugars. Now the grains are ready to start a new plant and, now we sort of kill it by forced drying. The end result of half germinated barley is called malt or malted barley.

The amount of sugar in the mixture determines the maximum ammount of alcohol the yeast can create in next steps. Higher alcohol content keeps the beer for longer periods, but makes the taste different.

Malt and sugars

Guiness Malt in Gosset’s time, came from Irish and English Barley stock. Since the sugar content in malt can vary from batch to batch, brewers need to check the sugar content. However there are only so many brewers and there is checking every batch is not scalable. The Guinness brewery was one of the largest in the world and this would not do.

The sugar content of malt extract was measured by degrees saccharine per barrel of 168 poinds malt weight. In earlier experiments the brewers determined that an extract around 133 degrees saccharine gave the desired alcohol level.

Way more saccharine would affect the stability and life of the beer and increase the alcohol content of the beer, however that would also increase the tax, Guinness had to pay to British goverment. A lower level of alcohol would make the beer go bad earlier and crucially, costumers would go to a competiter.

So you better make sure the malt extract is close to 133 degrees saccharine. In Gosset’s view, 0.5 degrees was a difference or error in malt extract level which Guinness and its customers could accept.

“It might be maintained,” he said, that malt extract “should be [estimated] within .5 of the true result with a probability of 10 to 1”

However how can we be certain that a barrel has a sugar content of 133 degrees?

They could take samples, and average those, but how many samples would give us enough certainty (that is with odds of 10 to 1 that the sample average is within 0.5 degree of the real value)?

Simulation

Malt, green, a handful of green malt

Malt, green, a handful of green malt

Gosset and his co workers went over to simulation. THIS IS WEIRDLY WRITTEN They had a lot of samples for one barrel, and Gosset simulated what would happen if they drew and averaged multiple samples. And generalized this to what would happen if they sampled from the all of the barrels.

And that is what I will show in R code.

So we are talking about sampling Randy Marshal sampling beer and wine (South Park)

GIVEN A NORMAL BAG OF BARLEY/MALT EXTRACT WE WANT TO BE WITHIN .5 OF REAL VALUE 10 TO 1.

A/(A + B ) 10/(10+1) = 10/11 0.9090909

Simulating drawing from a barrel

library(tidyverse)

First the goal in clearer language. We want to know how many samples you have to draw to know what the real degree saccharine level of the barrel is. With a 10 to 1 odds of being within 0.5 degree.

Let’s first transform that odds to probabilities, because I don’t know what those mean. A 10 to 1 odds means ten successes and 1 failure, so it is really 10 out of 11, 10/11=0.909.

First we create a dataset.
Say we have a large amount of MALT extract We took many many many many samples from one barrel, and so we know what the saccharine level of the barrel is.

set.seed(7334)
degrees_sacharine = rnorm(3000,mean = 133, sd = 0.6) # this is really just a guess

Then I create some functions to take a sample, a function to determine if that value is within the range, and finally I combine them.

take_sample_average <- function(bag= degrees_sacharine, sample_size){
  mean(sample(bag, sample_size, replace= TRUE))
}
within_range <- function(value){
  ifelse(value > 132.5 & value < 133.5, TRUE, FALSE)
}
is_sample_within_range <- function(sample_size){
  take_sample_average(bag = degrees_sacharine, sample_size = sample_size) %>% 
    within_range()
}
# for example what if we take 3 samples?
take_sample_average(bag = degrees_sacharine, 3)
## [1] 132.8

For example: So now we take 2 samples out of the bag, and get (142.2745, 119.4484)/2 = 142.2745. Is this within the range of 132.5 and 133.5? no.

But how often, on average am I within the real value of the bag? We simulate taking 2 to 15 samples from the barrel and averaging per sample.

sampling_experiments <-  
  tibble(
  N = seq_len(2500)
) %>% 
  mutate( # there is probably a more concise way to do this
    sample_02 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 2)),
    sample_03 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 3)),
    sample_04 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 4)),
    sample_05 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 5)),
    sample_06 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 6)),
    sample_07 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 7)),
    sample_08 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 8)),
    sample_09 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 9)),
    sample_10 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 10)),
    sample_15 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 15)),
    sample_20 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 20)),
    )

So how many times are the samples within the range?

sampling_experiments %>% 
  summarize_all(.funs = ~sum(.)/n()) %>% # this doesn't read that well
  # So I reshape the data, remove the N column and remove the sample_ part
  gather(key = "sample_size", value = "prob_in_range", sample_02:sample_20) %>% 
  select(-N) %>% 
  mutate(sample_size = str_remove(sample_size,pattern =  "sample_")) 
## # A tibble: 11 x 2
##    sample_size prob_in_range
##    <chr>               <dbl>
##  1 02                  0.755
##  2 03                  0.854
##  3 04                  0.901
##  4 05                  0.932
##  5 06                  0.954
##  6 07                  0.966
##  7 08                  0.982
##  8 09                  0.986
##  9 10                  0.991
## 10 15                  0.999
## 11 20                  0.999

Gosset found in his experiments that he needed at least 4 samples for a estimation with an odds of at least 10 to 1, which is a probability of approximately 0.909.

In our case for our bag of estimations we would say at least 5 samples to get these odds or better.

Armed with this knowledge the Guinness brewery knew it could test the malt extract by taking 4 samples out of every batch to get an approximation of the true sugar content of a batch that would be correct in 10 out of 11 times.

That meant that the brewery could use this technique to check the bags of malt extract, in stead of a master brewer sampling, and feeling the bags. You can scale the number of tests, but not the amount of brewers/ checkers.

The brewers were happy, Gosset was too. But he realized there must be a system there, a way to determine

Image credits

State of the machine

At the moment of creation (when I knitted this document ) this was the state of my machine: click here to expand

sessioninfo::session_info()
## ─ Session info ──────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.6.1 (2019-07-05)
##  os       macOS Mojave 10.14.6        
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       Europe/Amsterdam            
##  date     2019-10-11                  
## 
## ─ Packages ──────────────────────────────────────────────────────────────
##  package     * version date       lib source        
##  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.0)
##  backports     1.1.5   2019-10-02 [1] CRAN (R 3.6.0)
##  blogdown      0.16    2019-10-01 [1] CRAN (R 3.6.0)
##  bookdown      0.14    2019-10-01 [1] CRAN (R 3.6.0)
##  broom         0.5.2   2019-04-07 [1] CRAN (R 3.6.0)
##  cellranger    1.1.0   2016-07-27 [1] CRAN (R 3.6.0)
##  cli           1.1.0   2019-03-19 [1] CRAN (R 3.6.0)
##  colorspace    1.4-1   2019-03-18 [1] CRAN (R 3.6.0)
##  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.0)
##  digest        0.6.21  2019-09-20 [1] CRAN (R 3.6.0)
##  dplyr       * 0.8.3   2019-07-04 [1] CRAN (R 3.6.0)
##  ellipsis      0.3.0   2019-09-20 [1] CRAN (R 3.6.0)
##  evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.0)
##  fansi         0.4.0   2018-10-05 [1] CRAN (R 3.6.0)
##  forcats     * 0.4.0   2019-02-17 [1] CRAN (R 3.6.0)
##  generics      0.0.2   2018-11-29 [1] CRAN (R 3.6.0)
##  ggplot2     * 3.2.1   2019-08-10 [1] CRAN (R 3.6.0)
##  glue          1.3.1   2019-03-12 [1] CRAN (R 3.6.0)
##  gtable        0.3.0   2019-03-25 [1] CRAN (R 3.6.0)
##  haven         2.1.1   2019-07-04 [1] CRAN (R 3.6.0)
##  hms           0.5.1   2019-08-23 [1] CRAN (R 3.6.0)
##  htmltools     0.4.0   2019-10-04 [1] CRAN (R 3.6.0)
##  httr          1.4.1   2019-08-05 [1] CRAN (R 3.6.0)
##  jsonlite      1.6     2018-12-07 [1] CRAN (R 3.6.0)
##  knitr         1.25    2019-09-18 [1] CRAN (R 3.6.0)
##  lattice       0.20-38 2018-11-04 [1] CRAN (R 3.6.1)
##  lazyeval      0.2.2   2019-03-15 [1] CRAN (R 3.6.0)
##  lifecycle     0.1.0   2019-08-01 [1] CRAN (R 3.6.0)
##  lubridate     1.7.4   2018-04-11 [1] CRAN (R 3.6.0)
##  magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.0)
##  modelr        0.1.5   2019-08-08 [1] CRAN (R 3.6.0)
##  munsell       0.5.0   2018-06-12 [1] CRAN (R 3.6.0)
##  nlme          3.1-141 2019-08-01 [1] CRAN (R 3.6.0)
##  pillar        1.4.2   2019-06-29 [1] CRAN (R 3.6.0)
##  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 3.6.0)
##  purrr       * 0.3.2   2019-03-15 [1] CRAN (R 3.6.0)
##  R6            2.4.0   2019-02-14 [1] CRAN (R 3.6.0)
##  Rcpp          1.0.2   2019-07-25 [1] CRAN (R 3.6.0)
##  readr       * 1.3.1   2018-12-21 [1] CRAN (R 3.6.0)
##  readxl        1.3.1   2019-03-13 [1] CRAN (R 3.6.0)
##  rlang         0.4.0   2019-06-25 [1] CRAN (R 3.6.0)
##  rmarkdown     1.16    2019-10-01 [1] CRAN (R 3.6.0)
##  rstudioapi    0.10    2019-03-19 [1] CRAN (R 3.6.0)
##  rvest         0.3.4   2019-05-15 [1] CRAN (R 3.6.0)
##  scales        1.0.0   2018-08-09 [1] CRAN (R 3.6.0)
##  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.0)
##  stringi       1.4.3   2019-03-12 [1] CRAN (R 3.6.0)
##  stringr     * 1.4.0   2019-02-10 [1] CRAN (R 3.6.0)
##  tibble      * 2.1.3   2019-06-06 [1] CRAN (R 3.6.0)
##  tidyr       * 1.0.0   2019-09-11 [1] CRAN (R 3.6.0)
##  tidyselect    0.2.5   2018-10-11 [1] CRAN (R 3.6.0)
##  tidyverse   * 1.2.1   2017-11-14 [1] CRAN (R 3.6.0)
##  utf8          1.1.4   2018-05-24 [1] CRAN (R 3.6.0)
##  vctrs         0.2.0   2019-07-05 [1] CRAN (R 3.6.0)
##  withr         2.1.2   2018-03-15 [1] CRAN (R 3.6.0)
##  xfun          0.10    2019-10-01 [1] CRAN (R 3.6.0)
##  xml2          1.2.2   2019-08-09 [1] CRAN (R 3.6.0)
##  yaml          2.2.0   2018-07-25 [1] CRAN (R 3.6.0)
##  zeallot       0.1.0   2018-01-28 [1] CRAN (R 3.6.0)
## 
## [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

  1. from the wikipedia article I found the words: wort=the sugary liquid, mashing = mixing malted barley with hot water, liquor = hot water, grist =crushed malt, sparging = washing of grains, lautering = seperation of wort with grain itself

To leave a comment for the author, please follow the link and comment on their blog: Category R on Roel's R-tefacts.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)