Practical Kullback-Leibler (KL) Divergence: Discrete Case
[This article was first published on Memo's Island, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
KL divergence (Kullback-Leibler57) or KL distance is non-symmetric measure of difference between two probability distributions. It is related to mutual information and can be used to measure the association between two random variables.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Figure: Distance between two distributions. (Wikipedia) |
In this short tutorial, I show how to compute KL divergence and mutual information for two categorical variables, interpreted as discrete random variables.
${\bf Definition}$: Kullback-Leibler (KL) Distance on Discrete Distributions
Given two discrete probability distributions ${\it P}(A)$ and ${\it Q}(B)$ with discrete random variates, $A$ and $B$, having realisations $A=a_{j}$ and $B=b_{j}$, over $n$ singletons $j=1,…,n$. KL divergence or distance $D_{KL}$ in between $P$ and $Q$ is defined as follows:
$D_{KL} = D_{KL}\big({\it P}(A) || {\it Q}(B)\big)=\sum_{j=1}^{n} {\it P}(A=a_{j}) \log \Big( \cfrac{{\it P}(A=a_{j})}{{\it Q}(B=b_{j})} \Big)$
$\log$ is in base $e$.
${\bf Definition}$: Mutual Information on Discrete Random Variates
Two discrete random variables $X$ and $Y$, having realisations $X=x_{k}$ and $Y=y_{l}$, over $m$ and $n$ singletons $k=1,…,n$ and $l=1,…,m$ respectively, are given. Mutual information, $I(X;Y)$ is defined as follows,
$I(X;Y) = \sum_{k=1}^{n} \sum_{l=1}^{m} {\it R}(X=x_{k}, Y=y_{l}) \log \Big(\cfrac{ {\it R}(X=x_{k}, Y=y_{l}) }{ {\it R}(X=x_{k}) {\it R}(Y=y_{l})} \Big)$
$\log$ is in base $e$ and $R$ denotes probabilities.
${\bf Definition}$: Mutual Information on Discrete Random Variates with KL distance
Two discrete random variables $X$ and $Y$. Mutual information, $I(X;Y)$ is defined as follows,
$I(X;Y) = D_{KL} \Big( {\it R}(X, Y) || {\it R}(X) {\it R}(Y) \Big)$
${\bf Problem: Example}$: Given two discrete random variables $X$ and $Y$ defined on the following sample spaces $(a,b,c)$ and $(d,e)$ respectively. Write down the expression for the mutual information, I(X;Y), expanding summation.
${\bf Solution: Example}$:
$I(X;Y) = {\it R}(X=a, Y=d) \log \Big(\cfrac{ {\it R}(X=a, Y=d) }{ {\it R}(X=a) {\it R}(Y=d)} \Big) +$
${\it R}(X=b, Y=d) \log \Big(\cfrac{ {\it R}(X=b, Y=d) }{ {\it R}(X=b) {\it R}(Y=d)} \Big)+$
${\it R}(X=c, Y=d) \log \Big(\cfrac{ {\it R}(X=c, Y=d) }{ {\it R}(X=c) {\it R}(Y=d)} \Big)+$
${\it R}(X=a, Y=e) \log \Big(\cfrac{ {\it R}(X=a, Y=e) }{ {\it R}(X=a) {\it R}(Y=e)} \Big)+$
${\it R}(X=b, Y=e) \log \Big(\cfrac{ {\it R}(X=b, Y=e) }{ {\it R}(X=b) {\it R}(Y=e)} \Big)+$
${\it R}(X=c, Y=e) \log \Big(\cfrac{ {\it R}(X=c, Y=e) }{ {\it R}(X=c) {\it R}(Y=e)} \Big)$
${\bf Definition}$: Two discrete random variables $X$ and $Y$, having realisations $X=x_{k}$ and $Y=y_{l}$, over $m$ and $n$ singletons $k=1,…,n$ and $l=1,…,m$ respectively, are given. Two-way contingency table of realisations of $X$ and $Y$ along the same measured data points, $N$ observations, can be constructed by counting co-occurances, or cross-classifying factors. Normalizing the resulting frequency table will produce joint and marginal probabilities.
Y=y_{1} … Y=y_{l} R(X)
X=x_{1} R(X=x_{1}, Y=y_{1}) … R(X=x_{1}, Y=y_{l}) R(X=x_{1})
X=x_{2} R(X=x_{2}, Y=y_{1}) … R(X=x_{2}, Y=y_{l}) R(X=x_{2})
… … … … …
X=x_{k} R(X=x_{k}, Y=y_{1}) … R(X=x_{k}, Y=y_{1} R(X=x_{k})
R(Y) R(Y=y{1}) … R(Y=y_{l})
Simulated example with R
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | set.seed(3951824) # Table of counts (contingency) X <- sample(letters[1:3], 100, 1) Y <- sample(letters[4:5], 100, 1) t <- table(X,Y) tp <- t/100 # proportions tn <- tp/sum(tp) # normalized, joints p_x <- rowSums(tn) # marginals p_y <- colSums(tn) P <- tn Q <- p_x %o% p_y # P(X, Y) : bin frequency: P_i # P(X) P(Y) : bin frequency, Q_i mi <- sum(P*log(P/Q)) library(entropy) mi.empirical(t) == mi |
References
(KullbackLeibler57) Kullback, S.; Leibler, R.A. (1951). "On information and sufficiency". Annals of Mathematical Statistics 22 (1): 79–86.
To leave a comment for the author, please follow the link and comment on their blog: Memo's Island.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.