# Log-Sum-Exp trick

It is quite common to process small numbers in probability (no really, so small that double precision $10^{-308}$ may not be enough). They are encountered while modeling with Gaussians, which are so ubiquitous today. However, it is only so much that one Gaussian can do, and hence the obvious extension is to use a mixture of Gaussians, weighted differently with different means and covariances.

The probability function for a mixture of multi-variate Gaussians looks like
$p(x) = \sum_i w_i \frac{1}{\sqrt{(2\pi)^N |C_i|}} \exp[-\frac{1}{2} (x-\mu_i)^T C_i^{-1} (x-\mu_i)]$
where both $x$ and $\mu_i$ are vectors of dimension N, and i a count of the number of Gaussians in the mixture.

Everyone knows the above equation, but the problem comes when actually trying to compute it. So typically papers report this Log-Likelihood, which is basically the $\log(p(x))$. Consider an example of 2 gaussians. Then, you may need to evaluate something like $f(x) = \log(e^{-1000} + e^{-1001})$. Those are not made up. It is quite easy to obtain a score of 1000. Direct evaluation will usually return -Inf, but we clearly know that it cannot be true, and has to be around -999.something.

The simple thing to do is rewrite $f(x)$ as
$\log(e^{-1000}(e^0 + e^{-1})) = -1000 + \log(1 + e^{-1}) \approx -999.7$.

And that is the computational trick. Pull out the maximum, from the sum, leave the rest (which can be evaluated). Just FYI, the maximum value is exp(709) which lies just below the numerical Infinity.