Tensor Multiple Canonical Correlation Analysis (TMCCA): A Framework for Multi-View High-Dimensional Data Integration

Note: This manuscript is currently under preparation/review. Only the abstract and high-level methodological framework are presented here to protect novel contributions.

Abstract

Canonical Correlation Analysis (CCA) and its multi-view extension (MCCA) are fundamental tools for analyzing associations between data sets. However, traditional vector-based MCCA methods suffer from the “curse of dimensionality” and structural information loss when applied to high-order tensor data, such as time-frequency-space EEG or fMRI data.

This research introduces Tensor Multiple Canonical Correlation Analysis (TMCCA), a generalized framework designed to handle datasets of arbitrary and mixed orders. By operating directly on the tensor manifold, TMCCA aims to identify shared latent structures across multi-view data while preserving the intrinsic geometry of the input signals.

Problem Statement

In neuroimaging and healthcare data science, data often comes in the form of high-dimensional tensors (e.g., $Channels \times Time \times Frequency$). Traditional approaches require reshaping these tensors into vectors (vectorization), which leads to:

  1. Explosion of Parameters: Increasing the risk of overfitting in small-sample scenarios.
  2. Loss of Spatial/Temporal Structure: Ignoring the natural correlations between adjacent dimensions.

Proposed Framework

The proposed TMCCA method addresses these challenges through:

  • Tensor Decomposition: Utilizing tensor algebra (e.g., Tucker or CP decomposition) to define canonical variables, significantly reducing the number of parameters to be estimated.
  • Iterative Optimization: Employing an alternating least squares (ALS) algorithm to efficiently solve the high-dimensional optimization problem.
  • Regularization: Incorporating sparsity constraints to enhance the interpretability of the resulting spatial/temporal patterns.

Applications

  • Multi-Subject EEG Analysis: Extracting common neural signatures across subjects while accounting for individual variability.
  • Multi-Modal Fusion: Integrating EEG (temporal resolution) and fMRI (spatial resolution) data without degrading them into flat vectors.

For inquiries regarding this research or potential collaboration, please contact the author directly.