Covariance Features and Tangent-Space Mapping

4 minute read

Published: February 26, 2024

An EEG trial is a matrix.

The channels move together. That co-movement often matters more than the value of one channel at one time point.

Covariance features start from that fact.

Instead of asking what each channel did alone, they ask how channels varied together during the trial.

From Trial to Covariance

For one trial:

\[X \in \mathbb{R}^{C \times T}.\]

The covariance matrix is:

\[\Sigma = \frac{1}{T-1}XX^\top.\]

After centering, \(\Sigma\) is a \(C \times C\) matrix.

The diagonal entries are channel variances. The off-diagonal entries are channel relationships.

That is already useful. Motor imagery often changes rhythm power and channel coupling. A covariance matrix catches both.

The Matrix Is Not Just a Vector

It is tempting to flatten \(\Sigma\) into a vector and send it to a classifier.

That loses structure.

Covariance matrices are symmetric positive definite when they are well behaved. They live in a curved space, not ordinary Euclidean space.

The average of two covariance matrices is still a matrix. But the straight line between them is not always the right geometry for the problem.

This matters because distance is part of learning. A classifier, a clustering method, or an adaptation method needs to know what “close” means.

The SPD View

A covariance matrix belongs to the SPD cone:

\[\Sigma \in \mathcal{S}_{++}^{C}.\]

The eigenvalues are positive. The matrix describes an ellipsoid of trial variability.

Two covariance matrices can differ in scale, rotation, or both. A good geometry respects that.

That is the reason Riemannian methods are common in EEG.

They compare covariance matrices as covariance matrices.

The Tangent-Space Trick

Standard classifiers want vectors.

Tangent-space mapping gives them vectors without ignoring the geometry.

The idea is simple.

Pick a reference covariance matrix:

\[\bar{\Sigma}.\]

This is often the Riemannian mean of training covariances.

Then map each covariance matrix to the tangent space at that reference:

\[S_i = \log_{\bar{\Sigma}}(\Sigma_i).\]

Now \(S_i\) is a symmetric matrix in a flat local space.

Vectorize the upper triangle. Use those values as features.

Why the Reference Matters

The reference matrix is the anchor.

If it is fitted on training data only, the feature space is honest.

If it is fitted using test trials, the test session has already shaped the feature space.

This is the same leakage problem as whitening and CSP. The geometry does not remove the need for clean splitting.

In a cross-session setting, the reference can also become part of the adaptation question.

Do we anchor to the source?

Do we anchor to the target?

Do we use a pooled reference from allowed calibration data?

Each choice changes what the classifier sees.

What the Classifier Sees

After tangent-space mapping, a trial becomes a vector:

trial -> covariance -> tangent matrix -> vector

The vector contains variance and covariance information in a form that LDA, logistic regression, SVM, or another standard classifier can use.

This is why tangent-space features are practical. They keep much of the covariance structure but still fit into normal machine-learning tools.

Where Whitening Fits

Whitening and tangent-space mapping are related.

Whitening changes coordinates so that a reference covariance becomes close to the identity.

Tangent-space mapping describes each covariance by how it moves away from a reference covariance.

Both start from the same object: covariance.

Whitening is often used as preprocessing. Tangent-space mapping is often used as feature extraction.

The boundary is not strict. In EEG pipelines, preprocessing and feature extraction often share the same geometry.

What Can Go Wrong

Three things cause trouble.

First, covariance estimates can be noisy. Short trials and many channels make this worse. Regularization helps.

Second, the reference matrix can be unstable. If the training set is small, the tangent space moves.

Third, session shift can change the covariance geometry. A tangent space fitted on one session may not describe another session well.

These are not abstract problems. They show up as reduced transfer accuracy.

The Practical Reading

Covariance features are useful because EEG is multichannel.

Tangent-space mapping is useful because covariance matrices are not ordinary vectors.

The full pipeline is:

trial matrix
-> covariance matrix
-> geometry-aware map
-> feature vector
-> classifier

Once this pipeline is clear, domain adaptation becomes easier to explain.

Many adaptation methods are really asking this question:

How do we make the source covariance geometry look more like the target covariance geometry without using target labels?

Share on

Twitter Facebook LinkedIn

Yiming Shen

Covariance Features and Tangent-Space Mapping

From Trial to Covariance

The Matrix Is Not Just a Vector

The SPD View

The Tangent-Space Trick

Why the Reference Matters

What the Classifier Sees

Where Whitening Fits

What Can Go Wrong

The Practical Reading

Share on

You May Also Enjoy

Research Note: Markov-Switching State-Space Models for Neural Decoding

When Domain Adaptation Helps and When It Fails

Practical Guide: Distribution Distance Metrics for EEG Domain Adaptation

Multi-Source Domain Adaptation for EEG Sessions