Orthogonal Polynomials & Polynomial Regression

Comments
Last updated 2019-01-07 13:23:28 SGT

Note: These started off as some scribbles I made while reading Farnir+ 2018, which could benefit substantially from some heavy editing.

Consider an inner product of functions on a subset of $\mathbb{R}$ as defined by integration against a measure $\mu$ , as

$\left<f, g\right> = \int_{\mathbb{R}} f^*(x) g(x) \mu(\mathrm d x).$

For convenience, we demand that

$\int_{\mathbb{R}}\mu(\mathrm d x) = 1,$

or equivalently $\sum_n w_n = 1$ for a discrete subset (i.e. $\mu$ corresponds to a collection of $\delta$ functions with appropriate normalised weights).

Now consider the set of functions $\left\{f_0(x) = 1, f_1(x) = x, f_2(x) = x^2, \ldots\right\}$ . Clearly, $\left<f_i, f_j\right> = \left<x^{i + j}\right>,$ i.e. the $i + j$ th moment of x with respect to $\mu$ intepreted as a probability distribution.

We can construct a vector space out of these functions, which are linearly independent to each other (modulo maximality if the subset is discrete — $n$ points restricts us to at most $n$ basis functions), and it might be interesting to see what happens if we attempted to construct an orthonormal (or at least orthogonal) basis out of these polynomial basis functions. Therefore we recall the following convenient facts about the Gram-Schmidt procedure:

Definition (Gram-Schmidt): Given an inner product and a set of basis vectors $\left\{\mathbf{v}_i\right\}$ , we can construct an orthogonal basis $\left\{\mathbf{u}_i\right\}$ iteratively as

$\mathbf{u}_i = \mathbf{v}_i - \sum_j^{i-1} { \left<\mathbf{v}_i, \mathbf{u}_j\right> \over \left<\mathbf{u}_j, \mathbf{u}_j\right>} \mathbf{u}_j.$

The basis set $\left\{\mathbf{u}_i\right\}$ constructed in this manner is orthogonal ( $\left<\mathbf{u}_i, \mathbf{u}_j\right> \propto \delta_{ij}$ ). They can be found in closed form from the following procedure:

Definition (Gram Determinants): The vectors $\mathbf{u}_i$ defined above satisfy

$\mathbf{u}_j = {1 \over \det G_{j-1}} \cdot \det \left[\begin{array}{c|c} G_{j-1} & \begin{array}{c} \mathbf{v}_0 \\ \mathbf{v}_1 \\ \vdots \end{array} \\\hline \begin{array}{cccc}\left<\mathbf{v}_0, \mathbf{v}_j\right> & \left<\mathbf{v}_1, \mathbf{v}_j\right> & \cdots & \left<\mathbf{v}_{j-1}, \mathbf{v}_j\right> \end{array} & \mathbf{v}_j\end{array}\right].$

Here $G_{j-1}$ is a $j\times j$ matrix with matrix elements given by

$(G_{j-1})_{mn} = \left<\mathbf{v}_m, \mathbf{v}_n\right>,$

and the determinant of the matrix in the numerator is to be taken formally (as the rightmost column is actually vector-valued). Inner products can be taken against entries of the rightmost column before computing the determinant. It follows from this definition that

$\left<\mathbf{u}_j, \mathbf{u}_j\right> = \left<\mathbf{u}_j, \mathbf{v}_j\right> = {\det G_j \over \det G_{j - 1}}.$

Remark: This is somewhat reminiscent of taking the determinant of the metric on subspaces to yield covariant volume forms. Can the matrix $G$ be interpreted as some kind of metric? With respect to what vector space?

With respect to our choice of initial basis $\left\{f_i\right\}$ , the Gram matrices can be instead written as

$G_j = \begin{bmatrix}\left<1\right> & \left<x\right> & \cdots & \left<x^j\right> \\ \left<x\right> & \left<x^2\right> & \cdots & \left<x^{j+1}\right> \\ \vdots & \vdots & \ddots & \vdots \\ \left<x^j\right> & \left<x^{j+1}\right> & \cdots & \left<x^{2j}\right>\end{bmatrix}.$

Example (Legendre Polynomials): Consider a measure that is unity on the interval $(-1, 1)$ , and zero elsewhere. Then clearly

$\left<f_i, f_j\right> = \left<x^{i + j}\right> = \left\{\begin{array}{l r}0 & \text{if $i + j$ odd} \\ {2 \over i + j + 1} & \text{if $i + j$ even}\end{array}\right..$

Evaluating this explicitly, we have

$\begin{aligned}u_0(x) &= 1 \\ u_1(x) &= x \\ u_2(x) &= x^2 - {1 \over 3} \\ u_3(x) &= x^3 - {3 \over 5} x \\ u_4(x) &= x^4-\frac{6 }{7}x^2+\frac{3}{35}\end{aligned}$

and so on. Upon closer inspection, we find that these are simply the Legendre polynomials, normalised so that the highest-order term has coefficient 1.

Example (Hermite polynomials): Consider the measure $\mu(x) = {1 \over \sqrt{\pi}} e^{-x^2}.$ With some algebra we can show that

$\left<f_i, f_j\right> = \left<x^{i + j}\right> = \left\{\begin{array}{l r}0 & \text{if $i + j$ odd} \\ \Gamma\left(i + j + 1 \over 2\right)/\Gamma\left(1 \over 2\right) & \text{if $i + j$ even}\end{array}\right..$

Once again, evaluating the above expressions explicitly gives us

$\begin{aligned}u_0(x) &= 1 \\ u_1(x) &= x \\ u_2(x) &= x^2 - {1 \over 2} \\ u_3(x) &= x^3 - {3 \over 2} x \\ u_4(x) &= x^4-3x^2+\frac{3}{4},\end{aligned}$

which we recognise to be the Hermite polynomials, subject to the same normalisation as previously.

Polynomial regression in 1D

Consider a set of observations $\{y_1, \ldots, y_n\}$ taken at corresponding points $\{x_1, \ldots, x_n\}$ with independent measurement errors $\{\sigma_1, \ldots, \sigma_n\}$ . This can be thought of as a "data function" $y$ defined on these discrete points $\{x_n\}$ . Define weights as

$w_i = {1 \over \sigma_i^2}\left/\sum_n {1 \over \sigma_n^2}\right..$

Clearly this permits us to define a measure with unit norm in the above sense, yielding an inner product which is proportional to the Mahalanobis distance on the domain $\{x_n\}$ . We can state the polynomial regression problem in the following manner: we wish to find the coefficients $\{a_m\}$ as

$\{a_m\} = \arg\max \left<\sum_m a_m f_m - y, \sum_m a_m f_m - y\right>.$

Differentiating and setting the gradient to zero, this yields $m+1$ independent constraints of the form

$\sum_l a_l \left<x^{l + k}\right> \equiv \sum_l (G_m)_{kl} a_l = \left<y, x^k\right>, k=0,\ldots,m$

Then for the $m$ th coefficient (i.e. the highest-order term in the fitting polynomial) in particular, Cramer's Rule gives us an estimator as

$\begin{aligned} a_m &= {1 \over \det G_m} \cdot \det \left[\begin{array}{c|c} G_{m-1} & \begin{array}{c} \left<y\right> \\ \left<x y\right> \\ \vdots \end{array} \\\hline \begin{array}{cccc}\left<x^m\right> & \left<x^{m+1}\right> & \cdots & \left<x^{2m - 1}\right> \end{array} & \left<x^m y\right>\end{array}\right] \\ &= {\det G_{m-1} \over \det G_m} \left<u_m, y\right> = \boxed{\frac{\left<u_m, y\right> }{ \left<u_m, u_m\right>}}.\end{aligned}$

That is to say, the inner product of the data function against each orthogonal basis function gives the leading-order coefficient of the best-fitting polynomial of the corresponding order, up to a constant factor that happens to be given by the norm of the basis function.