ICA Objective Functions in One Place: Kurtosis, Negentropy, MI, and MLE

1 minute read

Published:

This note consolidates my 2023-2024 group-meeting derivations into a single map:

  • data model,
  • objective functions,
  • and optimization choices.

Starting Point

Classical ICA assumes:

\[x = As,\quad y = Wx\]

where \(x\) is observed mixture, \(s\) is latent independent source, \(A\) is mixing matrix, and \(W\) is separating matrix.

Objective 1: Maximize Nongaussianity

Kurtosis

\[\mathrm{kurt}(y)=\mathbb{E}[y^4]-3(\mathbb{E}[y^2])^2.\]
For whitened data, maximizing \(\mathrm{kurt}(w^\top z)\) gives a practical extraction direction.

Negentropy

\[J(y)=H(y_{\mathrm{gauss}})-H(y),\]

often approximated by contrast functions such as:

\[J(y)\propto\left(\mathbb{E}[G(y)]-\mathbb{E}[G(v)]\right)^2.\]

This is the basis of the fixed-point FastICA family.

Objective 2: Minimize Mutual Information

For outputs \(y_i\):

\[I(y_1,\dots,y_m)=\sum_i H(y_i)-H(y).\]

Independence is equivalent to minimizing mutual information. This gives the information-theoretic view behind many ICA updates.

Objective 3: Maximum Likelihood Estimation

Assuming source densities are modeled, maximize:

\[\log L(W)=\log|\det W|+\sum_i \log p_i(u_i),\quad u=Wx.\]

Extended Infomax can be read as a likelihood-based ICA method with adaptive nonlinearities.

Algorithm Layer

A useful mental model:

  • Objective decides what independence means.
  • Algorithm decides how fast and stable we can optimize it.

Common choices:

  • gradient/natural-gradient updates,
  • FastICA fixed-point updates,
  • deflationary vs symmetric orthogonalization for multi-component extraction.

Practical Takeaway

In high-dimensional settings, algorithmic convenience can dominate statistical quality. FastICA is often a strong baseline, but the surrogate contrast function and orthogonalization strategy should be treated as design choices, not defaults.