Kaggle Silver Medal: HMS - Harmful Brain Activity Classification
Published:
Competition Goal: To detect and classify six patterns of harmful brain activity (Seizure, LPD, GPD, LRDA, GRDA, Other) in critically ill patients using EEG recordings. This work contributes to automating neurocritical care diagnostics.
🏆 Achievement
- Rank: 98th out of 2,767 teams (Silver Medal).
- Role: Solo Competitor / Lead Data Scientist.
💡 Technical Approach
My solution focused on treating the multi-channel EEG time-series data as a computer vision problem by converting signals into spectrograms, leveraging the power of modern CNNs.
1. Data Preprocessing & Feature Engineering
- Spectrogram Conversion: Converted raw EEG signals (10-20 system) into Log-Mel Spectrograms to capture time-frequency features.
- Montage Engineering: Utilized “double banana” and other clinical montages to highlight spatial differences between brain hemispheres.
- Signal Cleaning: Applied bandpass filters to remove noise and power-line interference (50/60Hz).
2. Model Architecture
I employed an ensemble of 2D Convolutional Neural Networks, specifically EfficientNet (B0-B2) variants, pre-trained on ImageNet.
- Backbone:
EfficientNetfor high feature extraction efficiency. - Input: Stacked spectrograms representing different spatial montages.
- Pooling: GeM (Generalized Mean) Pooling to capture salient features across time.
3. Inference & Post-Processing
To maximize the stability of predictions (KL-Divergence metric), I implemented a Weighted Ensemble strategy combining predictions from multiple model checkpoints trained on different folds.
💻 Code Snippet: Weighted Ensemble Inference
The following snippet demonstrates the inference pipeline, where predictions from different models are weighted based on their validation performance to produce the final robust voting scores.
import pandas as pd
import numpy as np
import torch
from efficientnet_pytorch import EfficientNet
# --- Ensemble Configuration ---
# Weights derived from Nelder-Mead optimization on OOF (Out-of-Fold) data
model_weights = {
'model_v1': 0.28111,
'model_v2': 0.23014,
'model_v3': 0.31241,
'model_v4': 0.17634
}
def inference_ensemble(test_loader, model_paths):
final_preds = []
# Iterate through each model in the ensemble
for model_name, weight in model_weights.items():
# Load Architecture
model = EfficientNet.from_name('efficientnet-b0')
checkpoint = torch.load(f"./models/{model_name}.pth")
model.load_state_dict(checkpoint)
model.eval()
# Batch Prediction
fold_preds = []
with torch.no_grad():
for batch in test_loader:
images = batch['image'].cuda()
outputs = model(images)
# Softmax for probability distribution
probs = torch.softmax(outputs, dim=1)
fold_preds.append(probs.cpu().numpy())
# Apply Ensemble Weight
weighted_preds = np.concatenate(fold_preds) * weight
final_preds.append(weighted_preds)
# Sum weighted predictions
ensemble_result = np.sum(final_preds, axis=0)
return ensemble_result
# Example Output Structure
# Columns: [seizure_vote, lpd_vote, gpd_vote, lrda_vote, grda_vote, other_vote]
# Result: Top 5% Accuracy on Private Leaderboard