# Orthogonal Polynomials & Polynomial Regression

Last updated 2019-01-07 13:23:28 SGT

Note: These started off as some scribbles I made while reading Farnir+ 2018, which could benefit substantially from some heavy editing.

Consider an inner product of functions on a subset of $\mathbb{R}$ as defined by integration against a measure $\mu$, as

For convenience, we demand that

or equivalently $\sum_n w_n = 1$ for a discrete subset (i.e. $\mu$ corresponds to a collection of $\delta$ functions with appropriate normalised weights).

Now consider the set of functions $\left\{f_0(x) = 1, f_1(x) = x, f_2(x) = x^2, \ldots\right\}$. Clearly, $\left = \left,$ i.e. the $i + j$th moment of x with respect to $\mu$ intepreted as a probability distribution.

We can construct a vector space out of these functions, which are linearly independent to each other (modulo maximality if the subset is discrete — $n$ points restricts us to at most $n$ basis functions), and it might be interesting to see what happens if we attempted to construct an orthonormal (or at least orthogonal) basis out of these polynomial basis functions. Therefore we recall the following convenient facts about the Gram-Schmidt procedure:

Definition (Gram-Schmidt): Given an inner product and a set of basis vectors $\left\{\mathbf{v}_i\right\}$, we can construct an orthogonal basis $\left\{\mathbf{u}_i\right\}$ iteratively as

The basis set $\left\{\mathbf{u}_i\right\}$ constructed in this manner is orthogonal ($\left<\mathbf{u}_i, \mathbf{u}_j\right> \propto \delta_{ij}$). They can be found in closed form from the following procedure:

Definition (Gram Determinants): The vectors $\mathbf{u}_i$ defined above satisfy

Here $G_{j-1}$ is a $j\times j$ matrix with matrix elements given by

and the determinant of the matrix in the numerator is to be taken formally (as the rightmost column is actually vector-valued). Inner products can be taken against entries of the rightmost column before computing the determinant. It follows from this definition that

Remark: This is somewhat reminiscent of taking the determinant of the metric on subspaces to yield covariant volume forms. Can the matrix $G$ be interpreted as some kind of metric? With respect to what vector space?

With respect to our choice of initial basis $\left\{f_i\right\}$, the Gram matrices can be instead written as

Example (Legendre Polynomials): Consider a measure that is unity on the interval $(-1, 1)$, and zero elsewhere. Then clearly

Evaluating this explicitly, we have

and so on. Upon closer inspection, we find that these are simply the Legendre polynomials, normalised so that the highest-order term has coefficient 1.

Example (Hermite polynomials): Consider the measure $\mu(x) = {1 \over \sqrt{\pi}} e^{-x^2}.$ With some algebra we can show that

Once again, evaluating the above expressions explicitly gives us

which we recognise to be the Hermite polynomials, subject to the same normalisation as previously.

## Polynomial regression in 1D

Consider a set of observations $\{y_1, \ldots, y_n\}$ taken at corresponding points $\{x_1, \ldots, x_n\}$ with independent measurement errors $\{\sigma_1, \ldots, \sigma_n\}$. This can be thought of as a "data function" $y$ defined on these discrete points $\{x_n\}$. Define weights as

Clearly this permits us to define a measure with unit norm in the above sense, yielding an inner product which is proportional to the Mahalanobis distance on the domain $\{x_n\}$. We can state the polynomial regression problem in the following manner: we wish to find the coefficients $\{a_m\}$ as

Differentiating and setting the gradient to zero, this yields $m+1$ independent constraints of the form

Then for the $m$th coefficient (i.e. the highest-order term in the fitting polynomial) in particular, Cramer's Rule gives us an estimator as

That is to say, the inner product of the data function against each orthogonal basis function gives the leading-order coefficient of the best-fitting polynomial of the corresponding order, up to a constant factor that happens to be given by the norm of the basis function.