MMSE(Minimum Mean-Square error) Estimator

When a set of measurement vectors is given as $Z_{k} = z_{k}$, the MMSE estimator is defined as an estimator that minimizes the conditional average estimation error or conditional mean-square error as follows:

$J = \mathbb{E}[(X-\hat{x})^{T}(X-\hat{x}) \mid Z_{k} = z_{k}]$
$= \int_{-\infty}^{\infty}(x-g(z_{k}))^{T}(x-g(z_{k}))p_{X \mid Z_{k}}(x\mid z_{k})dx$

To simplify, the MMSE is expressed as follows:

$\hat{x}^{MMSE} = \arg\min(\mathbb{E}[(X-\hat{x})^{T}(X-\hat{x}) \mid Z_{k} = z_{k}])$

Since $\hat{x} = g(z_{k})$ is a constant value, $J = \mathbb{E}[X^{T}x-X^{T}\hat{x} - \hat{x}^{T}X + \hat{x}^{T}\hat{x} \mid Z_{k} = z_{k}] \ = \mathbb{E}[X^{T}X \mid Z_{k} = z_{k}] - 2\hat{x}^{T}\mathbb{E}[X \mid Z_{k}=z_{k}]+\hat{x}^{T}\hat{x}$.

Because the MMSE estimator is a function of a quadratic functional shape that is convex downward, we can get the minimum value when $\frac{dJ}{d\hat{x}} = 0$.

Therefore, $\frac{dJ}{d\hat{x}} = -2( \mathbb{E}[X\mid Z_{k}=z_{k}] + \hat{x})$. Note that $\frac{d\hat{x}}{d\hat{x}} = \frac{d\hat{x}^{T}}{d\hat{x}}$

$\therefore \hat{x}^{MMSE} = \mathbb{E}[x \mid Z_{k}=z_{k}]$ $= \int_{-\infty}^{\infty}xp_{X \mid Z_{k}}(x \mid z_{k})dx$ $= \frac{\int_{-\infty}^{\infty}xp_{Z_{k} \mid X}(z_{k} \mid x)p_{X}(x)dx}{\int_{-\infty}^{\infty}p_{Z_{k} \mid X}(z_{k} \mid x)p_{X}(x)dx}$

Similar to the formula above, if the measurement vectors are random vectors, the MMSE estimator is estimated as follows and the value will also be a random vector.

$\hat{X}^{MMSE} = \mathbb{E}[X \mid Z_{k}]$

There are four kinds of MMSE Estimators. Which are :

The Performance of The MMSE Estimator

Mean of The MMSE Estimator

The estimation error $\tilde{X}$ is defined as follows:

$\tilde{X} = X - \hat{X}^{MMSE}$

Then, the average of $\tilde{X}$ is:

$\mathbb{E}[\tilde{X}] = \mathbb{E}[X-\hat{X}^{MMSE}]$
$= \mathbb{E}[X] - \mathbb{E}[\mathbb{E}[X \mid Z_{k}]]$
$= \mathbb{E}[X] - \mathbb{E}[X] = 0$

Because the mean of $\tilde{X}$ is 0, the expectation value of the estimation value of the MMSE equals the expectation value of the unknown random vector $X$. That is, the MMSE estimator is an unbiased estimator.

Covariance of The Estimation Error

The covariance of the estimation error $\tilde{X}$ is given as the mean of the conditional covariance of $X$ conditioned on the set of measurement vectors $Z_{k}$, as follows:

$P_{\tilde{X}\tilde{X}} = \mathbb{E}[(\tilde{X}-\mathbb{E}[\tilde{X}])(\tilde{X}-\mathbb{E}[\tilde{X}])^{T}]$
$=\mathbb{E}[\tilde{X}\tilde{X}^{T}]$
$=\mathbb{E}[\mathbb{E}[\tilde{X}\tilde{X}^{T} \mid Z_{k}]]$
$=\mathbb{E}[P_{XX\mid Z_{k}}]$

If the measurement vector is given as $Z_{k} = z_{k}$, the covariance of the estimation error $\tilde{X}$ is as follows:

$P_{\tilde{X}\tilde{X}} = \mathbb{E}[\tilde{X}\tilde{X}^{T}]$
$= \mathbb{E}[(X-\mathbb{E}[X \mid Z_{k}=z_{k}])(X-\mathbb{E}[X \mid Z_{k}=z_{k}])^{T}]$
$=P_{XX \mid Z_{k}}$

Also, the estimation error is always orthogonal with $g_{j}$ which is composed by measurement variable. It is written as an equation as :

$\mathbb{E}[X - \hat{X}^{MMSE}g^{T}(Z)] = 0$

where $Z$ is a measurement vector.

To prove this,

$\mathbb{E}[(x - \hat{X}^{MMSE}g^{T}(Z))] = \mathbb{E}[Xg^{T}(Z)]-\mathbb{E}[\mathbb{E}[X \mid Z]g^{T}(Z)]$
$=\mathbb{E}[Xg^{T}(Z)]-\mathbb{E}[\mathbb{E}[Xg^{T}(Z) \mid Z]]$
$= \mathbb{E}[Xg^{T}(Z)] - \mathbb{E}[Xg^{T}(Z)] = 0$

Then, let $g^{T}(Z) = Z$, $\mathbb{E}[X-\hat{X}^{MMSE}Z^{T}] = 0$.

According to the equation above, $\hat{X}_{i}^{MMSE}$ is given as the value of projecting $X_{i}$ into a span consisting of a linear combination of measurement variables $Z_{i}$ and the measurement error is orthogonal to this span.

Joint Gaussian MMSE Estimator

Let’s get the MMSE estimation value $\hat{X}^{MMSE}$ of random vector $X$ when the two random vectors $X$ and $Z$ have a joint Gaussian distribution.
If the two random vectors are joint Gaussian vectors, each random vector follows a Gaussian distribution. Let $X$ and $Z$ have the following probability density functions:

$X \sim N(\mu_{X}, P_{XX}),$
$Z \sim N(\mu_{Z}, P_{ZZ})$

Also, assume that the joint probability density function of two random vectors is given as:

$Y = \begin{bmatrix} X \\ Z \end{bmatrix} \sim N(\mu_{Y}, P_{YY})$

Then, $\mu_{Y}$ and $P_{YY}$ are:

$\mu_{Y}= \begin{bmatrix} \mu_{X} \\ \mu_{Z} \end{bmatrix}, P_{YY} = \begin{bmatrix} P_{XX} & P_{XZ} \\ P_{ZX} & P_{ZZ} \end{bmatrix}$

where $P$ represents the covariance matrix.

To obtain $\hat{X}^{MMSE}$, we need the conditional probability density function of $X$ given $Z = z$. The conditional probability density function $p_{X\mid Z}(x \mid z)$ is given as:

$p_{X \mid Z}(x \mid z) = \frac{p_{XZ}(x, z)}{p_{Z}(z)} = \frac{p_{Y}(y)}{p_{Z}(z)}$
$= \frac{\sqrt{(2 \pi)^{p}\det{P_{ZZ}}}}{\sqrt{(2 \pi)^{n+p}\det{P_{YY}}}}\exp({-\frac{1}{2} (y-\mu_{Y})^{T}P^{-1}_{YY}(y-\mu_{Y})-(z-\mu_{Z})^{T}P^{-1}_{ZZ}(z-\mu_{Z})})$
$= \frac{1}{\sqrt{(2 \pi)^{n}\det{P_{XX\mid Z}}}} \exp{(-\frac{1}{2}(x-E[X \mid Z=z])^{T}P_{XX \mid Z} ^{-1}(x-E[X \mid Z=z]))}$

where

$E[X \mid Z=z]$ : $\mu_{X} + P_{XZ}P_{ZZ}^{-1}(z-\mu_{Z})$
$P_{XX \mid Z}$ : $P_{XX} - P_{XZ} P_{ZZ}^{-1} P_{ZX}$

Thus, the MMSE estimation value $\hat{X}^{MMSE}$ and the estimation error covariance $P_{\tilde{X}\tilde{X}}$ are:

$\hat{X}^{MMSE}(z) = \mu_{X} + P_{XZ}P_{ZZ}^{-1}(z-\mu_{Z})$
$P_{\tilde{X}\tilde{X}} = P_{XX \mid Z} = P_{XX}-P_{XZ}P_{ZZ}^{-1}P_{ZX}$

Joint Gaussian MMSE Estimator for Linear Measurements

Let an unknown random vector $X$ and a measurement vector $Z$ have a linear relationship as:

$Z = HX + V$

where $X$ and $V$ which is a measurement noise are given as Gaussian random vectors as:

$X \sim N(\mu_{X}, P_{XX}),$
$V \sim N(0, R),$
$\mathbb{E}[(X-\mu_{X})V^{T}]=0$

and assume that the two random vectors are uncorrelated.

Since $X$ and $V$ are uncorrelated Gaussian random vectors, and $X$ and $Z$ have a linear relationship, $X$ and $Z$ have joint Gaussian distribution. Thus, the estimation value $\hat{X}^{MMSE}$ and the estimation error covariance $P_{\tilde{X}\tilde{X}}$ of the random vector $X$ conditioned on the random vector $Z$ are:

$\hat{X}^{MMSE}(z) = \mu_{X}+P_{XX}H^{T}(HP_{XX}H^{T} + R)^{-1}(z-H\mu_{X})$
$P_{\tilde{X}\tilde{X}} = (P_{XX}^{-1} + H^{T}R^{-1}H)^{-1}$

To prove this, we need to get $\mu_{Z}$ first.

$\mu_{Z} = \mathbb{E}[Z] = \mathbb{E}[HX + V] = H\mathbb{E}[X] = H\mu_{X}$

Second, get $P_{ZZ}$.

$P_{ZZ} = \mathbb{E}[(Z-\mu_{Z})(Z-\mu_{Z})^{T}]$
$= \mathbb{E}[(H(X-\mu_{X})+V)(H(X-\mu_{X})+V)^{T}]$
$= HP_{XX}H^{T} + \mathbb{E}[H(X-\mu_{X})V^{T}]+\mathbb{E}[V(X-\mu_{X})^{T}H^{T}]+ R$
$=HP_{XX}H^{T}+R$

Third, get the cross-covariance $P_{XZ}$.

$P_{XZ} = \mathbb{E}[(X-\mu_{X})(Z-\mu_{Z})]$
$= \mathbb{E}[(X-\mu_{X})(H(X-\mu_{X})+V)^{T}]$
$= P_{XX}H^{T} + \mathbb{E}[(X-\mu_{X})V^{T}]$
$= P_{XX}H^{T}$

Finally, to get $\hat{X}^{MMSE}(z)$ and $P_{\tilde{X}\tilde{X}}$, use the equations of joint Gaussian MMSE estimator above.

$\hat{X}^{MMSE}(z) = \mu_{X}+P_{XX}H^{T}(HP_{XX}H^{T} + R)^{-1}(z-H\mu_{X})$
$P_{\tilde{X}\tilde{X}} = P_{XX\mid Z}$
$= P_{XX}-P_{XX}H^{T}(HP_{XX}H^{T}+R)^{-1}HP_{XX}$
$(P_{XX}^{-1} + H^{T}R^{-1}H)^{-1}$

Linear MMSE Estimator

When a measurement vector $z$ and an estimate $\hat{x}$ of an unknown random vector $X$ have a linear relationship given by $\hat{x}(z) = Az + b$, we refer to the estimator as a linear estimator.

If the measurement vector is not fixed as $Z=z$ and is instead given as a random vector, the linear estimator is expressed as $\hat{X}(Z) = AZ + b$.

The Linear Minimum Mean-Square Error (LMMSE) estimator is defined as an estimator that minimizes the following objective function.

$J = \mathbb{E}[(X-\hat{X}^{LMMSE})^{T}(X-\hat{X}^{LMMSE})]$
$= tr\mathbb{E}[(X-\hat{X}^{LMMSE})(X-\hat{X}^{LMMSE})^{T}]$
$= tr\mathbb{E}[(X - AZ -b)(X-AZ-b)^{T}]$
$= tr\mathbb{E}[(X-AZ-b-\mathbb{E}[X]+\mathbb{E}[X])(X-AZ-b-\mathbb{E}[X]+\mathbb{E}[X])^{T}]$
$= trP_{XX}+A(P_{ZZ}+\mathbb{E}[Z](\mathbb{E}[Z])^{T})A^{T}+(\mathbb{E}[X]-b)(\mathbb{E}[X]-b)^{T}-2A\mathbb{E}[Z](\mathbb{E}[X]-b)^{T}-2AP_{XZ}$

where

$tr$ : Trace(sum of diagonal components).
$A$ : A deterministic matrix. Need to be determined to make $J$ minimum.
$b$ : A deterministic vector. Need to be determined to make $J$ minimum.

The necessary conditions for minimizing $J$ are as:

$\frac{\partial J}{\partial b} = 2(\mathbb{E}[X]-b)-2A\mathbb{E}[Z]=0$
$\frac{\partial J}{\partial A} = 2A(P_{ZZ}+\mathbb{E}[Z](\mathbb{E}[Z])^{T} - 2P_{XZ}-2(\mathbb{E}[X]-b)(\mathbb{E}[Z])^{T})=0$

$A$ and $b$ are calculated among the equations above.

$A$ : $P_{XZ}P_{ZZ}^{-1}$
$b$ : $\mathbb{E}[X]-P_{XZ}P_{ZZ}^{-1}\mathbb{E}[Z]$

Then, LMMSE is given as:

$\hat{X}^{LMMSE}(Z) = \mathbb{E}[X] + P_{XZ}P_{ZZ}^{-1}(Z-\mathbb{E}[Z])$

If the estimation vector is confirmed as $Z=z$, LLMSE is given as:

$\hat{X}^{LLMSE}(z) = \mathbb{E}[X] + P_{XZ}P_{ZZ}^{-1}(z-\mathbb{E}[Z])$

Additionally, $\mathbb{E}[\hat{X}^{LMMSE}(Z)]$ and $P_{\tilde{X}\tilde{X}}$ are:

$\mathbb{E}[\hat{X}^{LMMSE}(Z)] = \mathbb{E}[\mathbb{E}[X]] + P_{XZ}P_{ZZ}^{-1}(\mathbb{E}[Z]-\mathbb{E}[Z]) = \mathbb{E}[X]$
$P_{\tilde{X}\tilde{X}} = \mathbb{E}[(\tilde{X}-\mathbb{E}[\tilde{X}])(\tilde{X}-\mathbb{E}[\tilde{X}])^{T}]$
$=\mathbb{E}[\tilde{X}\tilde{X}^{T}] = \mathbb{E}[(X-\hat{X}^{LMMSE})(X-\hat{X}^{LMMSE})^{T}]$
$= P_{XX}-P_{XZ}P_{ZZ}^{-1}P_{ZX}$

Since $\mathbb{E}[\hat{X}^{LMMSE}(Z)] = \mathbb{E}[X]$, the LMMSE estimator is unbiased.

Linear MMSE Estimator for Linear Measurements

Let an unknown random vector $X$ and a measurement vector $Z$ be related by the linear equation $Z = HX + V$. Assume that $X$ and the measurement noise $V$ are random vectors with arbitrary probability distributions and are uncorrelated with each other as:

$X \sim (\mu_{X},P_{XX})$,
$V \sim (0, R)$,
$\mathbb{E}[(X-\mu_{X})V^{T}] = 0$

The estimate $\hat{X}^{LMMSE}$ and the estimation error covariance $P_{\tilde{X}\tilde{X}}$ of the random vector $X$ conditioned on the random vector $Z = z$ are:

$\hat{X}^{LMMSE}(z) = \mu_X + P_{XX}H^{T}(HP_{XX}H^{T} + R)^{-1}(z - H\mu_X)$
$P_{\tilde{X}\tilde{X}} = (P_{XX}^{-1}+H^{T}R^{-1}H)^{-1}$

To prove this, we need to get $\mu_{Z}$ first.

$\mu_{Z}=\mathbb{E}[Z] = \mathbb{E}[HX+V] = H\mathbb{E}[X] = H\mu_{X}$

Second, get $P_{ZZ}$

$P_{ZZ} = \mathbb{E}[(Z-\mu_{Z})(Z-\mu_{Z})^{T}]$
$=\mathbb{E}[(H(X-\mu_{X})+V)(H(X-\mu_{X})+V)^{T}]$
$= HP_{XX}H^{T} + \mathbb{E}[H(X-\mu_{X})V^{T}]+ \mathbb{E}[V(X-\mu_{X})^{T}H^{T}]+R$
$=HP_{XX}H^{T} + R$

Third, get the cross-covariance $P_{XZ}$.

$P_{XZ} = \mathbb{E}[(X-\mu_{X})(Z-\mu_{Z})^{T}]$
$= \mathbb{E}[(X-\mu_{X})(H(X-\mu_{X})+V)^{T}]$
$=HP_{XX}H^{T} + \mathbb{E}[(X-\mu_{X})+V^{T}]$
$=P_{XX}H^{T}$

Finally, to get $\hat{X}^{LMMSE}(z)$ and $P_{\tilde{X}\tilde{X}}$, use the equations of the LMMSE estimator given above.

$\hat{X}^{LMMSE}(z) = \mu_{X}+P_{XX}H^{T}(HP_{XX}H^{T} +R)^{-1}(z-H\mu_{X})$
$P_{\tilde{X}\tilde{X}}=P_{XX}-P_{XX}H^{T}(HP_{XX}H^{T}+R)^{-1}HP_{XX}$
$=(P_{XX}^{-1}+H^{T}R^{-1}H)^{-1}$

MMSE Estimator

MMSE(Minimum Mean-Square error) Estimator

The Performance of The MMSE Estimator

Mean of The MMSE Estimator

Covariance of The Estimation Error

Joint Gaussian MMSE Estimator

Joint Gaussian MMSE Estimator for Linear Measurements

Linear MMSE Estimator

Linear MMSE Estimator for Linear Measurements

Further Reading

ML Estimator

WLS Estimator

MAP Estimator