Reducing scRNA-seq data via nonnegative matrix factorization for annotation and clustering analysis
abstract:
Single-cell RNA sequencing (scRNA-seq) provides an unprecedented opportunity to dissect tissue heterogeneity and characterize individual cells. Clustering similar cells and determining their cell type is an essential step for analyzing scRNA-seq data. Due to the high dimension of gene expression data and technical limitations of scRNA-seq, to achieve good clustering we first need to perform dimensionality reduction on the transcriptional data. Unsupervised approaches, such as PCA, are the common approach, however, current clustering algorithms that use the top PCs often fail to separate closely related, but different cells from one another
Here, we propose a novel mixed semi-supervised and unsupervised nonnegative matrix factorization (NMF)-based framework for both dimensionality reduction and annotation. The framework consists of three phases: decomposition, projection, and annotation prediction. In the first phase, variant methods of NMFs are tested to identify the latent genes base for reference transcriptomic datasets of pure cell types. These latent bases demonstrate a combination of relating genes that generate a low-dimensional space. Then, unlabeled data is represented by the latent bases via solving a linear least squares problem. In the annotation phase, the correlation between the reference and the test data is calculated using SingleR, based on the new representation. The proposed algorithm is evaluated on several pairs of scRNA-seq datasets for annotation. Experimental results demonstrate the effectiveness of our method when compared with annotation methods without dimensionality reduction. Furthermore, it is possible to use latent bases not only to reduce dimensions but also to characterize relationships between genes and discover new markers. We hope this approach will assist researchers in extracting more accurate and novel insights from their scRNA-seq data and will encourage follow-up research in this field.