# Conditioning multivariate Gaussian distributions

Last updated 2017-01-02 11:33:48 SGT

### Note to self:

Say we have $n$ jointly Gaussian random variables $\{X_1, X_2 \ldots X_j, X_{j+1} \ldots X_n\}$ with a covariance matrix $\Sigma$ (so that $\Sigma_{11} = \sigma_1^2$, $\Sigma_{12} = \sigma_1 \sigma_2 \rho_{12}$, etc). Suppose we wish to condition this distribution on $X_i = x_i$ for $i \le j$. Then the conditional covariance matrix is obtained by partial Cholesky decomposition on the first $j$ rows and columns of $\Sigma$ such that we have in block diagonal form

where $L$ is a lower-triangular matrix.

Let us work this out explicitly, representing $\Sigma$ in block matrix form as

where we assume the diagonal submatrices to be invertible. After some algebra, we obtain

where $L_{11}$ is the Cholesky decomposition of $\Sigma_{11}$. We have that

which is the Schur complement of $\Sigma_{11}$ in $\Sigma$.

Note that this expression does not depend on the mean values in any way. However, consider the column matrix

in block matrix form. If $\Sigma^{-1}$ is positive definite, we can associate it with the inner product of some vector space. If we consider $L$ to be the Jacobian of a (linear) coordinate transformation on this vector space, then clearly in those coordinates we must have (after more algebra)

We interpret the lower block as being the column matrix $X_{\{j\ldots n\}} - M_2'$, which immediately yields an expression for the conditioned means as

### Conditioning on one variable

In particular, consider the case of 3 jointly Gaussian random variables $\{X_0, X_1, X_2\}$ conditioned on $X_0 = x_0$. The above formulae yield

and

### More variables = more problems

These results must generalise (via the obvious symmetries) to an arbitrary number of jointly Gaussian variables conditioned on one value. Rather than perform the matrix inversion directly when more than one conditioned value is required, we might make the following observation: defining $\Sigma_{i,j} = \sigma_i \sigma_j \rho_{ij}$ (where $\rho_{kk} = 1$), observe that the above results imply that

On first blush, we might attempt to condition on more variables recursively in this fashion:

and so on. Unfortunately, strictly speaking this is inequivalent to the full matrix inversion, since the order of conditioning matters with this procedure if the conditioned variables are dependent on each other. For comparison, the correct expression is

which evidently reduces to the above iff $\rho_{01} = 0$.

Q: How do we generalise this to a Gaussian process (with arbitrary number of conditional values) in a manageable way?