//

## Monday, September 23, 2019

### Mutual Information

Mutual Information
$$I(X ; Y)=D_{\mathrm{KL}}\left(P_{(X, Y)} \| P_{X} \otimes P_{Y}\right)$$

Intuitively, mutual information measures the information that $X$ and $Y$ share: It measures how much knowing one of these variables reduces uncertainty about the other.

\begin{aligned} \mathrm{I}(X ; Y) & \equiv \mathrm{H}(X)-\mathrm{H}(X | Y) \\ & \equiv \mathrm{H}(Y)-\mathrm{H}(Y | X) \\ & \equiv \mathrm{H}(X)+\mathrm{H}(Y)-\mathrm{H}(X, Y) \\ & \equiv \mathrm{H}(X, Y)-\mathrm{H}(X | Y)-\mathrm{H}(Y | X) \end{aligned}

where $H(Y|X)$ means Conditional Entropy, defined as:

$$\mathrm{H}(Y | X)=-\sum_{x \in \mathcal{X}, y \in \mathcal{Y}} p(x, y) \log \frac{p(x, y)}{p(x)}$$

\begin{aligned} \mathrm{H}(Y | X) & \equiv \sum_{x \in \mathcal{X}} p(x) \mathrm{H}(Y | X=x) \\ &=-\sum_{x \in \mathcal{X}} p(x) \sum_{y \in \mathcal{Y}} p(y | x) \log p(y | x) \\ &=-\sum_{x \in \mathcal{X}} \sum_{y \in \mathcal{Y}} p(x, y) \log p(y | x) \\ &=-\sum_{x \in \mathcal{X}, y \in \mathcal{Y}} p(x, y) \log p(y | x) \\ &=-\sum_{x \in \mathcal{X}, y \in \mathcal{Y}} p(x, y) \log \frac{p(x, y)}{p(x)} \\ &=\sum_{x \in \mathcal{X}, y \in \mathcal{Y}} p(x, y) \log \frac{p(x)}{p(x, y)} \end{aligned}