CV
Summary
Ph.D. Candidate in Computational Science specializing in Biosignal Data Engineering, Statistical Modeling, and Automated Analytics Pipelines. Expert in architecting canonical data representations for high-dimensional signals and building versioned, reproducible data products using Python, R, and SQL. Passionate about bridging rigorous mathematical theory—such as Riemannian geometry and domain adaptation—with scalable, production-grade infrastructure.
Education
- Ph.D. in Computational Science (Data Analytics Track), University of Massachusetts Boston, Expected May 2026
- M.S. in Analytics and Modelling, Valparaiso University, May 2018
- B.Eng. in Electrical and Electronics Engineering, Chongqing University, China, Jun 2014
Research & Professional Experience
- Doctoral Researcher (Biosignal Analytics & Infrastructure)
University of Massachusetts Boston Sep 2019 - Present - Multi-Center Data Harmonization: Unified heterogeneous benchmarks (Physionet, BNCI) into a canonical data pipeline. Solved schema inconsistencies across datasets, enabling large-scale cross-study retrospective analysis.
- Scalable Pipeline Architecture: Optimized parallel workflows on Linux HPC clusters (Slurm). Refactored legacy code into efficient R/Python scripts, reducing cross-validation latency for TB-scale data from days to hours.
- Data Quality & Robustness: Developed a geometric framework to decouple biological signal drift from system noise. Constructed a “Robustness Atlas” to monitor pipeline sensitivity, serving as a QA layer for analytics outputs.
- Product-Oriented Model Selection: Operationalized a “Linear-First” deployment rule. Demonstrated that simple, robust linear baselines matched Deep Learning accuracy with 60% less compute, prioritizing maintainability over complexity.
- Instructor of Record (Math Department)
University of Massachusetts Boston Sep 2023 - Sep 2025 - Courses Taught: College Algebra (Math 115), Pre-Calculus (Math 130), and Calculus I (Math 140).
- Role: Designed curriculum, delivered lectures, and managed grading for classes of 30+ students. Synthesized complex quantitative concepts into accessible insights for diverse stakeholders.
- Data Engineer
China Mobile IoT Company Limited, Chongqing, China Jul 2014 - Aug 2016 - High-Volume SQL Analytics: Designed and maintained SQL transformations for high-frequency time-series data from millions of IoT sensors.
- System Reliability: Built dashboards to visualize network anomalies and data flow health, supporting operational decision-making.
Open Source Software & Tooling
- Lead Developer: DA4BCI (R Package)
- Data Harmonization: Architected a modular R package to standardize Transfer Learning workflows. Implemented canonical transformations (Riemannian Alignment, Optimal Transport) to mitigate distributional shifts.
- Reproducibility: Built automated testing suites and versioned documentation (CRAN-style) to ensure consistent metric generation.
- Developer: EEG-Feature-Engineering (Pipeline)
- Automated Processing: Engineered a scalable pipeline to transform raw biosignals into structured feature sets, mapping complex SPD manifolds into vector spaces for SQL-compatible storage.
- Reliability: Decoupled computation logic from inference APIs to prevent data leakage in production environments.
- Developer: eegwhiten (Data Quality Utility)
- Developed specialized whitening transforms to normalize high-dimensional covariance matrices, ensuring data quality and statistical validity before modeling.
Technical Skills
- Data Engineering & Pipelines: Automated ETL Workflows, CI/CD Concepts, Data Modeling, Version Control (Git), Docker, Linux/HPC (Slurm).
- Languages: SQL (Complex Transformations), Python (Pandas, NumPy, PyTorch), R (Package Development, Tidyverse), Bash/Shell.
- Biosignal Analytics: EEG Feature Extraction, Canonical Data Representation, Artifact Rejection, Clinical Metadata Integration.
- Mathematics & Modeling: Signal Processing (DSP), Covariance Matrix Estimation, Convex Optimization, Riemannian Geometry, Bayesian Inference.
Awards & Community
- Silver Medal (Top 5%), Kaggle (HMS - Harmful Brain Activity Classification), 2024
- Doctoral Fellowship, University of Massachusetts Boston, 2019 - 2025
- Volunteer Data Scientist, Statistics Without Borders, 2020 - 2021
- SAS Certified, Applied Econometrics and Data Science, 2018
Selected Manuscripts & Presentations
- Shen, Y., et al. Drift-Feature-Performance Decomposition via Structured Geometric Modeling. (Under Review).
- Shen, Y., et al. Confidence-Gated Adaptation for Cross-Session BCI. (Under Review).
- Invited Talk: Benchmarking Classification Pipelines Within and Cross Sessions on the PhysioNet EEG Dataset. MIND Seminar, Inria, France. (Jun 2025).