CSP and FBCSP for Motor-Imagery EEG
Published:
Motor-imagery EEG is not read channel by channel.
A left-hand trial and a right-hand trial may differ in a pattern spread across several electrodes. One channel may be weak. Another may be noisy. The useful signal is often a contrast across channels.
CSP starts there.
It asks a simple question:
Can I build a new channel that has high variance for one class and low variance for the other?
The Trial
Start with one filtered EEG trial:
\[X \in \mathbb{R}^{C \times T}.\]\(C\) is channels. \(T\) is time.
For motor imagery, the classes may be left hand and right hand. CSP does not look for a waveform. It looks for variance differences.
That is reasonable for motor imagery because event-related desynchronization and synchronization change rhythm power.
The Spatial Filter
CSP learns a matrix:
\[W \in \mathbb{R}^{C \times K}.\]Each column of \(W\) is a spatial filter. Applying the filter gives new signals:
\[Z = W^\top X.\]The rows of \(Z\) are not original electrodes. They are weighted mixtures of electrodes.
One filter may emphasize activity over motor cortex. Another may subtract activity from nearby channels. The point is not to make a pretty channel. The point is to make a discriminative channel.
What CSP Optimizes
CSP uses class covariance matrices.
Call them \(\Sigma_1\) and \(\Sigma_2\).
For a filter \(w\), the projected variance in class 1 is:
\[w^\top \Sigma_1 w.\]The projected variance in class 2 is:
\[w^\top \Sigma_2 w.\]CSP looks for filters with a large ratio:
\[\frac{w^\top \Sigma_1 w}{w^\top \Sigma_2 w}.\]Filters from one end of the solution favor class 1. Filters from the other end favor class 2.
That is why CSP features usually come in pairs. Keep filters from both ends, instead of keeping only the largest ones.
The Feature
After filtering, CSP usually keeps log-variance features:
\[f_j = \log \left(\frac{\operatorname{var}(Z_j)} {\sum_{\ell=1}^{K}\operatorname{var}(Z_\ell)}\right).\]The classifier sees \(f\), not the whole time series.
This turns a trial into a short vector. A linear classifier such as LDA often works well after this step because CSP already did much of the class separation.
Why Filtering Comes First
Motor-imagery information sits in frequency bands. The usual targets are mu and beta rhythms.
But the best band can move by subject, session, and task.
If CSP is fitted on a poor band, it learns spatial filters for weak signal. The classifier then receives clean-looking but useless features.
This is the reason for FBCSP.
Filter-Bank CSP
FBCSP repeats CSP across several frequency bands.
The pipeline is:
raw EEG trial
-> bandpass filter 1
-> CSP features for band 1
-> bandpass filter 2
-> CSP features for band 2
-> ...
-> concatenate features
-> select useful features
-> train classifier
The filter bank spreads the bet.
One subject may carry signal around 8-12 Hz. Another may show stronger separation around 18-26 Hz. FBCSP lets the data choose useful bands through the training procedure.
Where Leakage Appears
CSP is supervised. It uses labels.
FBCSP often adds feature selection. That also uses labels.
So the safe order is strict:
- split train and test,
- fit band filters if they are learned,
- fit CSP on training trials only,
- select features using training folds only,
- apply the fitted pipeline to test trials.
If CSP sees the test labels, the result is not a test result.
What CSP Does Not Solve
CSP does not remove all session shift.
It can be sensitive to covariance changes. A filter fitted on one session may not remain optimal in the next session. Electrode impedance, fatigue, attention, and small cap shifts can move the covariance structure.
That does not make CSP weak. It means CSP is a feature extractor, not a deployment policy.
For cross-session work, CSP often needs help from whitening, covariance alignment, domain adaptation, or source-session selection.
The Useful Mental Model
CSP builds discriminative sensors.
FBCSP builds discriminative sensors in several frequency bands.
The classifier is the last step. The main work happens before it.
For motor-imagery EEG, this is the clean pipeline:
frequency structure -> spatial contrast -> log-variance feature -> classifier
Once that is clear, the later questions become easier.
Which bands are stable?
Which spatial filters transfer across sessions?
Which source sessions should be trusted?
Those questions are the bridge from feature extraction to domain adaptation.
