Train/Test-Safe EEG Feature Extraction

5 minute read

Published:

EEG feature extraction looks simple from far away.

Take a trial. Compute a feature vector. Train a classifier.

That picture misses the hard part. Many EEG features are not just formulas. They are fitted objects. They learn something from the data before they transform a trial.

That is where leakage starts.

Start With One Trial

A motor-imagery EEG trial is usually a matrix:

\[X \in \mathbb{R}^{C \times T}.\]

Here \(C\) is the number of channels and \(T\) is the number of time points.

A classifier does not want this whole matrix. It wants a shorter vector:

\[x \in \mathbb{R}^{p}.\]

Feature extraction is the step that turns \(X\) into \(x\).

For simple bandpower, this may mean:

  1. filter the trial,
  2. compute power in each channel,
  3. take a log transform.

That sounds fixed. But most useful EEG pipelines are not fixed.

They estimate parameters from data.

The Feature Extractor Has Memory

Whitening learns a mean and a covariance matrix.

CSP learns spatial filters from labeled training trials.

Tangent-space features learn a reference covariance matrix.

Feature selection learns which columns to keep.

Scaling learns a mean and a standard deviation.

These learned values are memory. If the extractor sees the test set while learning them, the test set has already entered the training process.

The classifier may still look honest. The split may still exist. The result is still contaminated.

The Rule

Split first. Fit second. Transform third.

For one train/test split:

  1. split trials into train and test,
  2. fit every preprocessing and feature step on train only,
  3. transform train with the fitted objects,
  4. transform test with the same fitted objects,
  5. train the classifier on transformed train,
  6. evaluate on transformed test.

The test data can be transformed. It cannot be used to choose the transform.

That distinction is the whole point.

A Safe Pipeline

Write the pipeline as two verbs:

fit(train_trials, train_labels)
transform(new_trials)

The fit step is allowed to use labels if the method needs labels. CSP needs labels. Feature selection often needs labels. Class-aware filtering needs labels.

The transform step should not change the learned parameters. It should only apply them.

A safe CSP example looks like this:

fit CSP filters on training trials
apply those filters to training trials
apply those same filters to test trials
train classifier on training features
predict test features

An unsafe version looks like this:

fit CSP filters on all trials
split the resulting features
train and test classifier

The second version lets the test labels shape the spatial filters. That can inflate accuracy a lot.

Cross-Validation Needs the Same Discipline

Cross-validation does not fix leakage by itself.

Each fold needs its own fitted feature extractor.

For five folds, you fit the feature extractor five times. Each time, the fold held out for validation stays outside the fit.

Nested cross-validation adds one more layer. The inner loop chooses settings. The outer loop estimates performance. Feature extraction must be refit inside the correct loop.

If a parameter is chosen by validation accuracy, it belongs inside the inner loop.

If a transformation is learned from data, it belongs inside the training part of that fold.

What Counts as a Learned Parameter?

This list catches most mistakes:

  • channel mean,
  • feature mean and standard deviation,
  • covariance matrix,
  • whitening matrix,
  • CSP filter,
  • reference covariance for tangent-space mapping,
  • selected frequency band,
  • selected feature column,
  • PCA or ICA loading,
  • domain-adaptation map,
  • source-session weight.

If the value is estimated from data, it has to be fitted inside the training split.

Why EEG Makes This Easy to Get Wrong

EEG datasets are small.

The same subject often appears in several sessions.

Trials from one recording block are correlated.

Feature extractors can be high variance. A CSP filter fitted on all trials may learn tiny details of the held-out trials. A covariance reference fitted on all sessions may quietly pull the test session closer to the train sessions.

The model then looks better than it is.

The real deployment setting is colder. The future session does not help build the past pipeline.

The Clean Mental Model

Treat the feature extractor like part of the model.

The classifier is not the only learned object. The whole chain is learned:

preprocessing -> feature extraction -> adaptation -> classifier

When the chain is evaluated, the whole chain must be trained only on the allowed training data.

This is the reason I prefer feature extraction APIs with an explicit train/apply structure. It forces the code to say when it is learning and when it is only applying.

Practical Check

Before trusting an EEG result, I ask four questions.

First: were trials or sessions split before feature fitting?

Second: were labels from the validation or test fold used in CSP, feature selection, or tuning?

Third: was any covariance, mean, scaling value, or reference geometry estimated from the held-out fold?

Fourth: if domain adaptation used target data, was that target data allowed by the deployment setting?

If the answer is unclear, the result is unclear.

Clean feature extraction is not extra polish. It is the base contract of the experiment.