Covariance Features and Tangent-Space Mapping
Published:
An EEG trial is a matrix.
The channels move together. That co-movement often matters more than the value of one channel at one time point.
Covariance features start from that fact.
Instead of asking what each channel did alone, they ask how channels varied together during the trial.
From Trial to Covariance
For one trial:
\[X \in \mathbb{R}^{C \times T}.\]The covariance matrix is:
\[\Sigma = \frac{1}{T-1}XX^\top.\]After centering, \(\Sigma\) is a \(C \times C\) matrix.
The diagonal entries are channel variances. The off-diagonal entries are channel relationships.
That is already useful. Motor imagery often changes rhythm power and channel coupling. A covariance matrix catches both.
The Matrix Is Not Just a Vector
It is tempting to flatten \(\Sigma\) into a vector and send it to a classifier.
That loses structure.
Covariance matrices are symmetric positive definite when they are well behaved. They live in a curved space, not ordinary Euclidean space.
The average of two covariance matrices is still a matrix. But the straight line between them is not always the right geometry for the problem.
This matters because distance is part of learning. A classifier, a clustering method, or an adaptation method needs to know what “close” means.
The SPD View
A covariance matrix belongs to the SPD cone:
\[\Sigma \in \mathcal{S}_{++}^{C}.\]The eigenvalues are positive. The matrix describes an ellipsoid of trial variability.
Two covariance matrices can differ in scale, rotation, or both. A good geometry respects that.
That is the reason Riemannian methods are common in EEG.
They compare covariance matrices as covariance matrices.
The Tangent-Space Trick
Standard classifiers want vectors.
Tangent-space mapping gives them vectors without ignoring the geometry.
The idea is simple.
Pick a reference covariance matrix:
\[\bar{\Sigma}.\]This is often the Riemannian mean of training covariances.
Then map each covariance matrix to the tangent space at that reference:
\[S_i = \log_{\bar{\Sigma}}(\Sigma_i).\]Now \(S_i\) is a symmetric matrix in a flat local space.
Vectorize the upper triangle. Use those values as features.
Why the Reference Matters
The reference matrix is the anchor.
If it is fitted on training data only, the feature space is honest.
If it is fitted using test trials, the test session has already shaped the feature space.
This is the same leakage problem as whitening and CSP. The geometry does not remove the need for clean splitting.
In a cross-session setting, the reference can also become part of the adaptation question.
Do we anchor to the source?
Do we anchor to the target?
Do we use a pooled reference from allowed calibration data?
Each choice changes what the classifier sees.
What the Classifier Sees
After tangent-space mapping, a trial becomes a vector:
trial -> covariance -> tangent matrix -> vector
The vector contains variance and covariance information in a form that LDA, logistic regression, SVM, or another standard classifier can use.
This is why tangent-space features are practical. They keep much of the covariance structure but still fit into normal machine-learning tools.
Where Whitening Fits
Whitening and tangent-space mapping are related.
Whitening changes coordinates so that a reference covariance becomes close to the identity.
Tangent-space mapping describes each covariance by how it moves away from a reference covariance.
Both start from the same object: covariance.
Whitening is often used as preprocessing. Tangent-space mapping is often used as feature extraction.
The boundary is not strict. In EEG pipelines, preprocessing and feature extraction often share the same geometry.
What Can Go Wrong
Three things cause trouble.
First, covariance estimates can be noisy. Short trials and many channels make this worse. Regularization helps.
Second, the reference matrix can be unstable. If the training set is small, the tangent space moves.
Third, session shift can change the covariance geometry. A tangent space fitted on one session may not describe another session well.
These are not abstract problems. They show up as reduced transfer accuracy.
The Practical Reading
Covariance features are useful because EEG is multichannel.
Tangent-space mapping is useful because covariance matrices are not ordinary vectors.
The full pipeline is:
trial matrix
-> covariance matrix
-> geometry-aware map
-> feature vector
-> classifier
Once this pipeline is clear, domain adaptation becomes easier to explain.
Many adaptation methods are really asking this question:
How do we make the source covariance geometry look more like the target covariance geometry without using target labels?
