Integrated sources model: A new space-learning model for heterogeneous multi-view data reduction, visualization, and clustering
In machine learning, multi-view data involve multiple distinct sets of attributes (“views”) for a common set of observations; when each view has the same attributes considered in different contexts, the data are said to contain multiple views of homogeneous format, which can be conceptualized as a tensor. In this article, we describe a novel approach for integrating multiple views of heterogeneous format into a common latent space using a workflow that involves non-negative matrix and tensor factorization (NMF/NTF). This approach, which we refer to as the integrated sources model (ISM), consists of two main steps: Embedding and analysis. In the embedding step, the views are transformed into matrices with common non-negative components. In the analysis step, the transformed views are combined into a tensor and decomposed using NTF. We also present a variant of ISM; the integrated latent sources model (ILSM), which offers significant advantages over ISM in terms of computational power and in cases where the views are highly unbalanced with regard to the number of attributes per view. Noteworthy, ISM can be extended to process multi-omic and multi-view datasets even in the presence of missing views. We provide a proof-of-concept analysis using five examples, including the UCI Digits (the University of California Irvine Pen-Based Recognition of Handwritten Digits) dataset, a public cell-type gene signatures dataset, and a multi-omic single-cell dataset. These examples demonstrate that, in most cases, multi-view clustering is better achieved with ISM or its variant ILSM than with other latent space approaches. We also show how the non-negativity and sparsity of the ISM model components enable straightforward interpretations, in contrast to other approaches that involve latent factors of mixed signs. Finally, we present potential applications to single-cell multi-omics and spatial mapping, including spatial imaging, spatial transcriptomics, and computational biology, which are currently under evaluation. ISM relies on state-of-the-art algorithms invoked through a simple workflow implemented in Python.
- Cichocki A, Zdunek R, Phan AH, Amari S. Nonnegative Matrix and Tensor Factorizations. John Wiley & Sons; 2009. doi: 10.1002/9780470747278
- Perry R, Mischler G, Guo R, et al. mvlearn: Multiview Machine Learning in Python. arXiv. Preprint posted online 2020. doi: 10.48550/arXiv.2005.11890
- Argelaguet R, Velten B, Arnol D, et al. Multi‐omics factor analysis-a framework for unsupervised integration of multi‐omics data sets. Mol Syst Biol. 2018;14(6):e8124. doi: 10.15252/msb.20178124
- Argelaguet R, Arnol D, Bredikhin D, et al. MOFA+: A statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 2020;21(1):111. doi: 10.1186/s13059-020-02015-1
- Wu J, Lin Z, Zha H. Essential tensor learning for multi-view spectral clustering. IEEE Trans Image Process. 2019;28(12):5910-5922. doi: 10.1109/tip.2019.2916740
- Guo W, Che H, Leung MF. Tensor-based adaptive consensus graph learning for multi-view clustering. IEEE Trans Consumer Electron. 2024;70(2):4767-4784. doi: 10.1109/tce.2024.3376397
- Li J, Gao Q, Wang Q, Xia W, Gao X. Multi-View Clustering via Semi-non-negative Tensor Factorization. arXiv. Preprint posted online 2023. doi: 10.48550/arXiv.2303.16748
- Wang S, Cao J, Lei F, Jiang J, Dai Q, Ling BW. Multiple kernel-based anchor graph coupled low-rank tensor learning for incomplete multi-view clustering. Appl Intell. 2022;53(4):3687-3712. doi: 10.1007/s10489-022-03735-6
- Zhao W, Gao Q, Li G, Deng C, Yang M. One-Step Multi- View Clustering Based on Transition Probability. arXiv. Preprint posted online 2024. doi: 10.48550/arXiv.2403.01460
- Ali W, Yang M, Ali M, Ud-Din S. Fuzzy model-based sparse clustering with multivariate t-mixtures. Appl Artif Intell. 2023;37(1):2169299. doi: 10.1080/08839514.2023.2169299
- Yang M, Hussain I. Unsupervised multi-view k-means clustering algorithm. IEEE Access. 2023;11:13574-13593. doi: 10.1109/access.2023.3243133
- Hussain I, Sinaga KP, Yang M. Unsupervised multiview fuzzy C-means clustering algorithm. Electronics. 2023;12(21):4467-4467. doi: 10.3390/electronics12214467
- Smilde AK, Westerhuis JA, de Jong S. A framework for sequential multiblock component methods. J Chemometr. 2003;17(6):323-337. doi: 10.1002/cem.811
- Trendafilov NT. Stepwise estimation of common principal components. Comput Stat Data Anal. 2010;54(12):3446-3457. doi: 10.1016/j.csda.2010.03.010
- Tenenhaus A, Tenenhaus M. Regularized generalized canonical correlation analysis for multiblock or multigroup data analysis. Eur J Oper Res. 2014;238(2):391-403. doi: 10.1016/j.ejor.2014.01.008
- Zhang C, Hu Q, Fu H, Zhu PF, Cao X. Latent Multi-View Subspace Clustering. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017:4333-4341. doi: 10.1109/cvpr.2017.461
- Chen M, Huang L, Wang C, Huang D. Multi-view clustering in latent embedding space. Proc AAAI Conf Artif Intell. 2020;34(4):3513-3520. doi: 10.1609/aaai.v34i04.5756
- Leppäaho E, Ammad-ud-din M, Kaski S. GFA: Exploratory analysis of multiple data sources with group factor analysis. J Mach Learn Res. 2017;18(39):1-5.
- Zhao S, Gao C, Mukherjee S, Engelhardt BE. Bayesian group factor analysis with structured sparsity. J Mach Learn Res. 2016;17(196):1−47.
- Zhang X, Zhao L, Zong L, Liu X, Yu H. Multi-view Clustering via Multi-Manifold Regularized Nonnegative Matrix Factorization. In: IEEE International Conference on Data Mining; 2014:1103-1108. doi: 10.1109/icdm.2014.19
- Huizing G, Deutschmann IM, Peyré G, Cantini L. Paired single-cell multi-omics data integration with Mowgli. Nat Commun. 2023;14(1):7711. doi: 10.1038/s41467-023-43019-2
- Brbic M, Kopriva I. Multi-view low-rank sparse subspace clustering. Pattern Recognit. 2018;73:247-258. doi: 10.1016/j.patcog.2017.08.024
- Dong Y, Che H, Leung MF, Liu C, Yan Z. Centric graph regularized log-norm sparse non-negative matrix factorization for multi-view clustering. Signal Process. 2024;217:109341. doi: 10.1016/j.sigpro.2023.109341
- Fu L, Lin P, Vasilakos AV, Wang S. An overview of recent multi-view clustering. Neurocomputing. 2020;402:148-161. doi: 10.1016/j.neucom.2020.02.104
- Duin R. Multiple Features. UC Irvine Machine Learning Repository; 1998.doi: 10.24432/C5HC70
- Boldina G, Fogel P, Rocher C, Bettembourg C, Luta G, Augé F. A2Sign: Agnostic algorithms for signatures-a universal method for identifying molecular signatures from transcriptomic datasets prior to cell-type deconvolution. Bioinformatics. 2021;38(4):1015-1021. doi: 10.1093/bioinformatics/btab773
- Lewis DD, Yang Y, Rose TG, Li F. RCV1: A new benchmark collection for text categorization research. J Mach Learn Res. 2004;5:361-397.
- Brbic M, Piškorec M, Vidulin V, Kriško A, Šmuc T, Supek F. The landscape of microbial phenotypic traits and associated genes. Nucleic Acids Res. 2016;44:10074-10090. doi: 10.1093/nar/gkw964
- Swanson E, Lord C, Reading J, et al. Simultaneous trimodal single-cell measurement of transcripts, epitopes, and chromatin accessibility using TEA-seq. eLife. 2021;10:e63632. doi: 10.7554/eLife.63632
- Hirschman AO. The paternity of an index. Am Econ Rev. 1964;54(5):761-762.
- Fogel P, Geissler C, Morizet N, Luta G. On rank selection in non-negative matrix factorization using concordance. Mathematics. 2023;11(22):4611. doi: 10.3390/math11224611
- Badeau R, Bertin N, Vincent E. Stability analysis of multiplicative update algorithms and application to nonnegative matrix factorization. IEEE Trans Neural Netw. 2010;21(12):1869-1881. doi: 10.1109/tnn.2010.2076831
- Donoho DL, Stodden VC. When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? Columbia University. 2004. doi: 10.7916/D88D05N7
- Hubert L, Arabie P. Comparing partitions. J Classif. 1985;2(1):193-218. doi: 10.1007/BF01908075
- Strehl A, Ghosh J. Cluster ensembles-A knowledge reuse framework for combining multiple partitions. J Mach Learn Res. 2002;3:583-617. doi: 10.1162/153244303321897735
- Fowlkes EB, Mallows CL. A method for comparing two hierarchical clusterings: Rejoinder. J Am Stat Assoc. 1983;78(383):584. doi: 10.2307/2288123
- Demaine E, Hesterberg A, Koehler F, Lynch J, Urschel J. Multidimensional Scaling: Approximation and Complexity. arXiv. Preprint posted online 2021. doi: 10.48550/arXiv.2109.11505
- Zhai Z, Lei YL, Wang R, Xie Y. Supervised capacity preserving mapping: A clustering guided visualization method for scRNA-seq Data. Bioinformatics. 2022;38(9):2496-2503. doi: 10.1093/bioinformatics/btac131
- Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine Learning in Python. arXiv. Preprint posted online 2012. doi: 10.48550/arXiv.1201.0490
- Fogel P, Hawkins DM, Beecher C, Luta G, Young SS. A tale of two matrix factorizations. Am Stat. 2013;67(4):207-218. doi: 10.1080/00031305.2013.845607
- Brunet JP, Tamayo P, Golub TR, Mesirov JP. Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci U S A. 2004;101(12):4164-4169. doi: 10.1073/pnas.0308531101
- Hoyer PO. Non-negative matrix factorization with sparseness constraints. arXiv. Preprint posted online 2004. doi: 10.48550/arXiv.CS/0408058
- Potluru VK, Plis S, Le Roux J, Pearlmutter BA, Calhoun VD, Hayes TP. Block Coordinate Descent for Sparse NMF. International Conference on Learning Representations (ICLR); 2013.
- Boutsidis C, Gallopoulos E. SVD based initialization: A head start for nonnegative matrix factorization. Pattern Recognit. 2008;41(4):1350-1362. doi: 10.1016/j.patcog.2007.09.010
- Ma A, Wang X, Li J, et al. Single-cell biological network inference using a heterogeneous graph transformer. Nat Commun. 2023;14(1):964. doi: 10.1038/s41467-023-36559-0
- Vaswani A, Shazeer NM, Parmar N, et al. Attention is all you need. In: NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017:6000-6010.
- Park J, Jin IH, Jeon M. How social networks influence human behavior: An integrated latent space approach for differential social influence. Psychometrika. 2023;88:1529-1555. doi: 10.1007/s11336-023-09934-5
- Pinel P, Guichaoua G, Najm M, et al. Exploring isofunctional molecules: Design of a benchmark and evaluation of prediction performance. Mol Inform. 2023;42(4):e2200216. doi: 10.1002/minf.202200216
