Conditioning multivariate Gaussian distributions
CommentsLast updated 2017-01-02 11:33:48 SGT
Note to self:
Say we have [n] jointly Gaussian random variables [\{X_1, X_2 \ldots X_j, X_{j+1} \ldots X_n\}] with a covariance matrix [\Sigma] (so that [\Sigma_{11} = \sigma_1^2], [\Sigma_{12} = \sigma_1 \sigma_2 \rho_{12}], etc). Suppose we wish to condition this distribution on [X_i = x_i] for [i \le j]. Then the conditional covariance matrix is obtained by partial Cholesky decomposition on the first [j] rows and columns of [\Sigma] such that we have in block diagonal form
[\Sigma = L \begin{bmatrix} \mathbb{I}_j & 0 \\ 0 & \Sigma' \end{bmatrix} L^T,]where [L] is a lower-triangular matrix.
Let us work this out explicitly, representing [\Sigma] in block matrix form as
[\Sigma = \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{bmatrix},]where we assume the diagonal submatrices to be invertible. After some algebra, we obtain
[\Sigma = \underbrace{\begin{bmatrix} L_{11} & 0 \\ \Sigma_{21} L_{11}^{-T} & \mathbb{I}_{n-j} \end{bmatrix}}_{=L} \begin{bmatrix} \mathbb{I}_j & 0 \\ 0 & \Sigma_{22} - \Sigma_{21}\Sigma^{-1}_{11}\Sigma_{12} \end{bmatrix} \begin{bmatrix} L_{11}^T & L_{11}^{-1} \Sigma_{12} \\ 0 & \mathbb{I}_{n-j} \end{bmatrix},]where [L_{11}] is the Cholesky decomposition of [\Sigma_{11}]. We have that
[\Sigma' = \Sigma_{22} - \Sigma_{21}\Sigma^{-1}_{11}\Sigma_{12},]which is the Schur complement of [\Sigma_{11}] in [\Sigma].
Note that this expression does not depend on the mean values in any way. However, consider the column matrix
[X = \begin{bmatrix}x_1 - \mu_1 \\ x_2 - \mu_2 \\ \vdots \\ x_j - \mu_j \\ X_{j+1} - \mu_{j+1} \\ \vdots \\ X_n - \mu_n \end{bmatrix} \equiv \begin{bmatrix}x - M_1 \\ X_{\{j\ldots n\}} - M_2 \end{bmatrix},]in block matrix form. If [\Sigma^{-1}] is positive definite, we can associate it with the inner product of some vector space. If we consider [L] to be the Jacobian of a (linear) coordinate transformation on this vector space, then clearly in those coordinates we must have (after more algebra)
[X = L X' = L \begin{bmatrix} L_{11}^{-1}\left(x - M_1\right) \\ X_{\{j\ldots n\}} - M_2 - \Sigma_{21}\Sigma_{11}^{-1}\left(x - M_1\right) \end{bmatrix}.]We interpret the lower block as being the column matrix [X_{\{j\ldots n\}} - M_2'], which immediately yields an expression for the conditioned means as
[M_2' = M_2 + \Sigma_{21}\Sigma_{11}^{-1}\left(x - M_1\right).]Conditioning on one variable
In particular, consider the case of 3 jointly Gaussian random variables [\{X_0, X_1, X_2\}] conditioned on [X_0 = x_0]. The above formulae yield
[\Sigma' = \begin{bmatrix} \sigma_1^2 (1 - \rho_{10}^2) & \sigma_1 \sigma_2 (\rho_{12} - \rho_{10}\rho_{20}) \\ \sigma_1 \sigma_2 (\rho_{12} - \rho_{10}\rho_{20}) & \sigma_2^2 (1 - \rho_{20}^2) \end{bmatrix}]and
[\mu_i' = \mu_i + {\sigma_i \over \sigma_0} \rho_{0i} (x_0 - \mu_0).]More variables = more problems
These results must generalise (via the obvious symmetries) to an arbitrary number of jointly Gaussian variables conditioned on one value. Rather than perform the matrix inversion directly when more than one conditioned value is required, we might make the following observation: defining [\Sigma_{i,j} = \sigma_i \sigma_j \rho_{ij}] (where [\rho_{kk} = 1]), observe that the above results imply that
[\Sigma'_{i,j} = \sigma_i \sigma_j \left(\rho_{ij} - \rho_{0i} \rho_{0j}\right) \equiv \sigma_i \sigma_j \rho'_{ij}.]On first blush, we might attempt to condition on more variables recursively in this fashion:
[\begin{aligned}\Sigma''_{i,j} &= \sigma_i\sigma_j \left(\left(\rho_{ij} - \rho_{0i} \rho_{0j}\right) - \left(\rho_{1i} - \rho_{01} \rho_{0i}\right)\left(\rho_{1j} - \rho_{01} \rho_{0j}\right)\right) \\ &= \sigma_i\sigma_j \left(\rho_{ij} - \rho_{1i}\rho_{1j} + \rho_{01} \left(\rho_{0i}\rho_{1j} + \rho_{1i} \rho_{0j}\right) - \left(1 + \rho_{01}^2 \right) \rho_{0i} \rho_{0j}\right) \\ &= \sigma_i \sigma_j \left(\rho'_{ij} - \rho'_{1i} \rho'_{1j}\right) \\&\equiv \sigma_i \sigma_j \rho''_{ij} \end{aligned}]and so on. Unfortunately, strictly speaking this is inequivalent to the full matrix inversion, since the order of conditioning matters with this procedure if the conditioned variables are dependent on each other. For comparison, the correct expression is
[\rho''_{ij} = \rho_{ij} - {\rho_{0i}\rho_{0j} - \rho_{01}\left(\rho_{0i}\rho_{1j} + \rho_{1i}\rho_{0j}\right) + \rho_{1i}\rho_{1j} \over \sqrt{1 - \rho_{10}^2}}]which evidently reduces to the above iff [\rho_{01} = 0].
Q: How do we generalise this to a Gaussian process (with arbitrary number of conditional values) in a manageable way?