Tensor Multiple Canonical Correlation Analysis (TMCCA): A Framework for Multi-View High-Dimensional Data Integration

Note: This manuscript is currently under preparation/review. Only the abstract and high-level methodological framework are presented here to protect novel contributions.

Abstract

Canonical Correlation Analysis (CCA) and its multi-view extension (MCCA) are fundamental tools for analyzing associations between data sets. However, traditional vector-based MCCA methods suffer from the “curse of dimensionality” and structural information loss when applied to high-order tensor data, such as time-frequency-space EEG or fMRI data.

This research introduces Tensor Multiple Canonical Correlation Analysis (TMCCA), a generalized framework designed to handle datasets of arbitrary and mixed orders. By operating directly on the tensor manifold, TMCCA aims to identify shared latent structures across multi-view data while preserving the intrinsic geometry of the input signals.

Problem Statement

In neuroimaging and healthcare data science, data often comes in the form of high-dimensional tensors (e.g., $Channels \times Time \times Frequency$). Traditional approaches require reshaping these tensors into vectors (vectorization), which leads to:

Explosion of Parameters: Increasing the risk of overfitting in small-sample scenarios.
Loss of Spatial/Temporal Structure: Ignoring the natural correlations between adjacent dimensions.

Proposed Framework

The proposed TMCCA method addresses these challenges through:

Tensor Decomposition: Utilizing tensor algebra (e.g., Tucker or CP decomposition) to define canonical variables, significantly reducing the number of parameters to be estimated.
Iterative Optimization: Employing an alternating least squares (ALS) algorithm to efficiently solve the high-dimensional optimization problem.
Regularization: Incorporating sparsity constraints to enhance the interpretability of the resulting spatial/temporal patterns.

Applications

Multi-Subject EEG Analysis: Extracting common neural signatures across subjects while accounting for individual variability.
Multi-Modal Fusion: Integrating EEG (temporal resolution) and fMRI (spatial resolution) data without degrading them into flat vectors.

For inquiries regarding this research or potential collaboration, please contact the author directly.

Share on

Twitter Facebook LinkedIn

Yiming Shen

Abstract

Problem Statement

Proposed Framework

Applications

Share on