Signal Detection Theory vs. Logistic Regression

[This article was first published on R on I Should Be Writing: The Musical, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I recently came across a paper that explained the equality between the parameters of signal detection theory (SDT) and the parameters of logistic regression in which the state (“absent”/“present”) is used to predict the response (“yes”/“no”, but also applicable in scale-rating designs) (DeCarlo, 1998; DOI: 10.1037/1082-989X.3.2.186).

Here is a short simulation-proof for this equality.

Setup

For this simulations we will need the following packages:

# For plotting
library(ggplot2)

# For extracting SDT parameters
library(neuropsychology)

We will also need to make sure, for the logistic regression analysis, that our factors’ dummy coding is set to effects-coding – otherwise the intercept’s meaning will not correspond to the criterion (aka the overall response bias):

options(contrasts = c('contr.sum', 'contr.poly'))

The Simulations

n <- 100L
B <- 100L

We’ll run 100 simulations with 100 trials each.

Simulation Code

set.seed(1)

SDT_params <- function(state,resp) {
  tab <- table(state,resp)
  
  sdt_res <- neuropsychology::dprime(
    n_hit  = tab[2,2],
    n_miss = tab[2,1],
    n_fa   = tab[1,2],
    n_cr   = tab[1,1]
  )
  
  c(sdt_res$dprime , sdt_res$c)
}

logistic_reg_params <- function(state,resp){
  fit <- glm(resp ~ state, family = binomial())
  
  coef(fit)
}

# initialize
res <- data.frame(d_    = numeric(B),
                  c_    = numeric(B),
                  int   = numeric(B),
                  slope = numeric(B))

# Loop
for (b in seq_len(B)) {
  true_sensitivity <- rexp(1,10) # random
  true_criterion <- runif(1,-1,1) # random
  
  # true state vector
  state_i <- rep(c(F,T), each = n/2)
  
  # response vector
  Xn <- rnorm(n/2) # noise dist
  Xs <- rnorm(n/2, mean = true_sensitivity) # signal + noise dist
  X <- c(Xn,Xs)
  resp_i <- X > true_criterion
  
  # SDT params
  res[b,1:2] <- SDT_params(state_i,resp_i)
  
  # logistic regression params
  res[b,3:4] <- logistic_reg_params(state_i,resp_i)
}

Results

SDT parameters are on a standardized normal scale, meaning they are scaled to σ=1. However, the logistic distribution’s scale is σ=π/3. Thus, to convert the logistic regression’s parameters to the SDT’s we need to scale both the intercept and the slope by 3/π to have them on the same scale as c and d. Additionally,

  1. The slope must be also scaled by 2 due to R’s default effects coding.
  2. The intercept must also be scaled by 1 - see paper for the full rationale.

The red-dashed line represents the expected regression line predicting the SDT parameters from their logistic counterparts:

  • d=2×3π×Slope
  • c=3π×Intercept

(The blue line is the empirical regression line.)

Conclusions

I haven’t tested here how this equality can be extended to multi-level designs with generalized linear mixed models (GLMM), but I see no reason this wouldn’t be possible… One could model random effects per subject, and the moderating effect of some X on sensitivity could in theory be modeled by including an interaction between X and state; similarly, the moderating effect of X on the criterion can be modeled by including a main effect for X (moderation the intercept).

To leave a comment for the author, please follow the link and comment on their blog: R on I Should Be Writing: The Musical.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)