ICA Objective Functions in One Place: Kurtosis, Negentropy, MI, and MLE

1 minute read

Published: February 18, 2026

This note consolidates my 2023-2024 group-meeting derivations into a single map:

data model,
objective functions,
and optimization choices.

Starting Point

Classical ICA assumes:

\[x = As,\quad y = Wx\]

where \(x\) is observed mixture, \(s\) is latent independent source, \(A\) is mixing matrix, and \(W\) is separating matrix.

Objective 1: Maximize Nongaussianity

Kurtosis

\[\mathrm{kurt}(y)=\mathbb{E}[y^4]-3(\mathbb{E}[y^2])^2.\]

For whitened data, maximizing \(

\mathrm{kurt}(w^\top z)

\) gives a practical extraction direction.

Negentropy

\[J(y)=H(y_{\mathrm{gauss}})-H(y),\]

often approximated by contrast functions such as:

\[J(y)\propto\left(\mathbb{E}[G(y)]-\mathbb{E}[G(v)]\right)^2.\]

This is the basis of the fixed-point FastICA family.

Objective 2: Minimize Mutual Information

For outputs \(y_i\):

\[I(y_1,\dots,y_m)=\sum_i H(y_i)-H(y).\]

Independence is equivalent to minimizing mutual information. This gives the information-theoretic view behind many ICA updates.

Objective 3: Maximum Likelihood Estimation

Assuming source densities are modeled, maximize:

\[\log L(W)=\log|\det W|+\sum_i \log p_i(u_i),\quad u=Wx.\]

Extended Infomax can be read as a likelihood-based ICA method with adaptive nonlinearities.

Algorithm Layer

A useful mental model:

Objective decides what independence means.
Algorithm decides how fast and stable we can optimize it.

Common choices:

gradient/natural-gradient updates,
FastICA fixed-point updates,
deflationary vs symmetric orthogonalization for multi-component extraction.

Practical Takeaway

In high-dimensional settings, algorithmic convenience can dominate statistical quality. FastICA is often a strong baseline, but the surrogate contrast function and orthogonalization strategy should be treated as design choices, not defaults.

Share on

Twitter Facebook LinkedIn

Yiming Shen

ICA Objective Functions in One Place: Kurtosis, Negentropy, MI, and MLE

Starting Point

Objective 1: Maximize Nongaussianity

Kurtosis

Negentropy

Objective 2: Minimize Mutual Information

Objective 3: Maximum Likelihood Estimation

Algorithm Layer

Practical Takeaway

Share on

You May Also Enjoy

Research Note: Markov-Switching State-Space Models for Neural Decoding

From Matrix ICA to Tensor ICA: Architectures and Decomposition Choices

Practical Guide: Distribution Distance Metrics for EEG Domain Adaptation

Mathematical Derivation: PCA Whitening Implementation