ICA Objective Functions in One Place: Kurtosis, Negentropy, MI, and MLE
Published:
This note consolidates my 2023-2024 group-meeting derivations into a single map:
- data model,
- objective functions,
- and optimization choices.
Starting Point
Classical ICA assumes:
\[x = As,\quad y = Wx\]where \(x\) is observed mixture, \(s\) is latent independent source, \(A\) is mixing matrix, and \(W\) is separating matrix.
Objective 1: Maximize Nongaussianity
Kurtosis
\[\mathrm{kurt}(y)=\mathbb{E}[y^4]-3(\mathbb{E}[y^2])^2.\]| For whitened data, maximizing \( | \mathrm{kurt}(w^\top z) | \) gives a practical extraction direction. |
Negentropy
\[J(y)=H(y_{\mathrm{gauss}})-H(y),\]often approximated by contrast functions such as:
\[J(y)\propto\left(\mathbb{E}[G(y)]-\mathbb{E}[G(v)]\right)^2.\]This is the basis of the fixed-point FastICA family.
Objective 2: Minimize Mutual Information
For outputs \(y_i\):
\[I(y_1,\dots,y_m)=\sum_i H(y_i)-H(y).\]Independence is equivalent to minimizing mutual information. This gives the information-theoretic view behind many ICA updates.
Objective 3: Maximum Likelihood Estimation
Assuming source densities are modeled, maximize:
\[\log L(W)=\log|\det W|+\sum_i \log p_i(u_i),\quad u=Wx.\]Extended Infomax can be read as a likelihood-based ICA method with adaptive nonlinearities.
Algorithm Layer
A useful mental model:
- Objective decides what independence means.
- Algorithm decides how fast and stable we can optimize it.
Common choices:
- gradient/natural-gradient updates,
- FastICA fixed-point updates,
- deflationary vs symmetric orthogonalization for multi-component extraction.
Practical Takeaway
In high-dimensional settings, algorithmic convenience can dominate statistical quality. FastICA is often a strong baseline, but the surrogate contrast function and orthogonalization strategy should be treated as design choices, not defaults.