1 Introduction

The spatialHeatmap package offers the primary functionality for visualizing biological assay data in a spatial context. It colors spatial features (e.g. tissues) annotated in anatomical images according to the quantitative abundance levels of measured biomolecules (e.g. mRNAs) using a color key. The output plot is called a spatial heatmap (SHM). Additionally, it provides extended functionalities for large-scale data mining routines and co-visualizing bulk and single-cell data.

In the co-visualization plot, single-cell data are visualized in embedding plots (PCA, UMAP, TSNE) while bulk data in SHMs. Cells are associated with their source tissues through cell group labels and the association is indicated by common colors between embedding plots and SHMs. Thus prior to co-visualization, cell group labels are needed in the single-cell data. The automated co-clustering is one of the several cell grouping methods, which is developed in spatialHeatmap. It assigns source tissues to single cells as group labels through a co-clustering process (Figure 1). This vignette focuses on the optimization of the co-clustering workflow.

1.1 Co-clustering workflow

The co-clustering method (Figure 1) is useful for predicting source tissues of unlabeled cells without prior knowledge. While attractive there are various challenges. This is due to the different properties of single cell and bulk gene expression data, such as lower sensitivity and higher sparsity in single cell compared to bulk data. This method is utilized largely based on parameter optimization. The input bulk and single-cell data should come from overlapping tissues. The following introduces main steps of this method using the example of RNAseq count data:

  1. The raw count matrices of bulk and single cells are column-wise combined for joint normalization (Figure 1A1). After separated from bulk data, the single-cell data are reduced to genes with robust expression across at least a proportion of cells and to cells with robust expression across at least a proportion of genes (Figure 1A2). In the bulk data, genes are filtered according to expression values exceeding a cutoff over a proportion of bulk samples and a coefficient of variance (CV) between CV1 and CV2 (Figure 1A2).

  2. The bulk data are subsetted to the same genes as the single cell data (Figure 1A3). This and the previous filtering steps reduce the sparsity in the single-cell data and the bulk data are made more compareable to the single cell data by subsetting it to the same genes.

  3. Bulk and single-cell data are column-wise combined for joint embedding using PCA or UMAP (Figure 1B). Co-clustering is then performed. Specifically, a graph is built on the the embedding data with methods (Table 1) from scran (Lun, McCarthy, and Marioni 2016), where nodes are cells (or tissues) and edges are connections between nearest neighbors, and subsequently this graph is partitioned with methods (Table 1) from igraph to obtain clusters (Csardi and Nepusz 2006). Three types of clusters are produced. First, a single tissue is co-clustered with multiple cells (Figure 1C1), and this tissue is assigned to all these cells. Second, multiple tissues are co-clustered with multiple cells (Figure 1C2). The nearest-neighbor tissue is assigned to each cell based on the similarity measure Spearman’s correlation coefficient. Third, no tissue is co-clustered with cells (Figure 1C3). All these cells are treated as unlabeled and represent candidates for discovering novel cell types. After co-clustering, cells are labeled by tissues or remain unlabeled (Figure 1D) and these labels are used for associating cells and tissues in embedding plots and SHMs, respectively (Figure 1E).

Co-clustering illustration. (A) The single-cell and bulk tissue data are jointly pre-processed. (B) Single-cell and bulk data are embedded with dimension reduction methods. (C) The embedding results are used for co-clustering single-cells and bulk tissue data. Cells are assigned to tissues based on the clustering results as follows: (1) If a cluster contains a single tissue, then the cells of this cluster are assigned to the corresponding tissue. (2) If a cluster contains multiple tissues and cells, a nearest-neighbor approach resolves this ambiguous situation by assigning cells to the closest tissue sample. (3) Cells in clusters without tissue samples remain unassigned. (D) The cell-tissue assignments and the similarity scores of the predictions are stored in a table. (E) The predictions are used to color the cells by predicted source tissues in co-visualization plots.

Figure 1: Co-clustering illustration
(A) The single-cell and bulk tissue data are jointly pre-processed. (B) Single-cell and bulk data are embedded with dimension reduction methods. (C) The embedding results are used for co-clustering single-cells and bulk tissue data. Cells are assigned to tissues based on the clustering results as follows: (1) If a cluster contains a single tissue, then the cells of this cluster are assigned to the corresponding tissue. (2) If a cluster contains multiple tissues and cells, a nearest-neighbor approach resolves this ambiguous situation by assigning cells to the closest tissue sample. (3) Cells in clusters without tissue samples remain unassigned. (D) The cell-tissue assignments and the similarity scores of the predictions are stored in a table. (E) The predictions are used to color the cells by predicted source tissues in co-visualization plots.

1.2 Co-clustering optimization overivew

It is challenging to correctly assign bulk tissues to every single cell in a mixture, especially the rare cell types, since the properties between these two types of data are quite different such as the high sparsity in single-cell data. Thus the co-clustering method (Figure 1) is expected to assign source tissues only to major cell types. The objective of optimizaiton is to obtain optimal default settings for the co-clustering workflow. The underlying assumption is that optimal settings for the co-clustering are similar across species, so the strategy is optimizing this method on one species and validating the optimal settings on other species. Due to the limitation of available data sets that can be used for the optimization, the co-clustering is optimized on one organ of Arabidopsis thaliana (Arabidopsis) root and the obtained optimal settings are tested on two other organs of mouse brain and kidney. In this process, both bulk and single-cell data are labelled with known identities, so the tissue-cell assignments can be classified by TRUE or FALSE (Figure 1D). The similarity values and TRUE/FALSE labels are then used to compute AUC values (Robin et al. 2011) and accuracies, which serve as metrics for assessing the quality of tissue-cell assignments.

The optimization focused on the two main steps: joint dimension reduction and co-clustering (Figure 1). The relevant parameters and respective settings are shown in Table 1. Two phases are involved in the optimization. The first phase approximates optimal settings that are further optimized in the second phase. In each phase, the co-clustering process is run iteratively with each possible setting combination.

Table 1: Parameter settings to optimize.
Parameter Settings Description
dimensionReduction (dimred) PCA, UMAP Dimension reduction methods. Choosing “PCA” and “UMAP” involves utilizing the “denoisePCA” function from the scran package and the “runUMAP” function from the scater package, respectively
topDimensions (dims) 5 to 80 Number of top dimensions selected for co-clustering.
graphBuilding (graph.meth) knn, snn Methods for building a graph where nodes are cells and edges are connections between nearest neighbors. Choosing “knn” and “snn” involves utilizing the “buildKNNGraph” and “buildSNNGraph” function from the scran package, respectively.
clusterDetection (cluster) wt, fg, le Methods for partitioning the graph to generate clusters. Choosing “wt”, “fg”, and “le” involves utilizing the “cluster_walktrap”, “cluster_fast_greedy”, and “cluster_leading_eigen” function from the igraph package, respectively.

2 Getting Started

2.1 Loading packages

The packages required for running the sample code in this vignette are loaded.

library(spatialHeatmap); library(SummarizedExperiment); library(scran); library(scater); library(igraph); library(SingleCellExperiment); library(BiocParallel); library(kableExtra); library(gridExtra); library(ggplot2); library(pROC)

To reduce runtime, intermediate results can be cached under ~/.cache/shm.

cache.pa <- '~/.cache/shm' # Set path of the cache directory

2.2 Optimization

The training bulk (Li et al. 2016) and single-cell (Shahan et al. 2020) datasets of Arabidopsis root are listed in Table 2. Details about how format them are described here.

Due to long computation time, most of the following code chunks are not evaluated when building this vignette.

Table 2: Training datasets.
Name DataType File
blk.arab.rt bulk GSE152766
sc.arab.rt10 cell GSM4625995_sc_10_at_COPILOT.rds
sc.arab.rt11 cell GSM4625996_sc_11_COPILOT.rds
sc.arab.rt12 cell GSM4625997_sc_12_COPILOT.rds
sc.arab.rt30 cell GSM4626001_sc_30_COPILOT.rds
sc.arab.rt31 cell GSM4626002_sc_31_COPILOT.rds

To obtain reproducible results, always set a fixed seed for Random Number Generator at the beginning.

set.seed(50)

The ground-truth matching between bulk tissues and single cells for the training datasets is defined in a data.frame (df.match.arab).

match.pa <- system.file("extdata/cocluster/data", "true_match_arab_root_cocluster.txt", package="spatialHeatmap")
df.match.arab <- read.table(match.pa, header=TRUE, row.names=1, sep='\t')
df.match.arab[1:3, ]
##             cell            trueBulk
## 1        atricho NONHAIR,LRC_NONHAIR
## 2 colu.dist.colu                COLU
## 3  colu.dist.lrc                COLU

The bulk data, single cell data, and matching table are organized in a nested list (dat.lis).

dat.lis <- list(
  dataset1=list(bulk=blk.arab.rt, cell=sc.arab.rt10, df.match=df.match.arab),
  dataset2=list(bulk=blk.arab.rt, cell=sc.arab.rt11, df.match=df.match.arab),
  dataset3=list(bulk=blk.arab.rt, cell=sc.arab.rt12, df.match=df.match.arab),
  dataset4=list(bulk=blk.arab.rt, cell=sc.arab.rt30, df.match=df.match.arab),
  dataset5=list(bulk=blk.arab.rt, cell=sc.arab.rt31, df.match=df.match.arab)
)

2.2.1 The first phase

The optimization involves two phases. The first phase approximates optimal settings that are further refined in the second phase. Prior to the first phase, the training datasets are pre-processed. Specifically, bulk and sinle-cell data are jointly normalized with the method computeSumFactors (FCT) from the scran package (Lun, McCarthy, and Marioni 2016). In the subsequent filtering, genes in bulk data are filtered according to expression values \(\ge\) 1 (A) at a proportion of \(\ge\) 0.1 across bulk samples. The single cell data are reduced to genes with expression values \(\ge\) 1 (cutoff) across \(\ge\) 5% cells and to cells with expression values \(\ge\) 1 (cutoff) across \(\ge\) 10% genes. These pre-processing details is defined in the following.

norm <- c('FCT') # Normalization.
# Filtering settings.
df.fil.set <- data.frame(p=c(0.1), A=rep(1, 1), cv1=c(0.1), cv2=rep(50, 1), cutoff=rep(1, 1), p.in.cell=c(0.15), p.in.gen=c(0.05), row.names=paste0('fil', seq_len(1)))
df.fil.set
##        p A cv1 cv2 cutoff p.in.cell p.in.gen
## fil1 0.1 1 0.1  50      1      0.15     0.05
fil <- 'fil1'

The settings of topDimensions (dims) (Table 1) are set at a large interval of 15 so as to reduce iterative runs. All setting combinations (Table 1) are organized in a data.frame (df.para1).

dimred <- c('PCA', 'UMAP'); dims <- seq(5, 80, 15)  
graph <- c('knn', 'snn'); cluster <- c('wt', 'fg', 'le')  
df.para1 <- expand.grid(dataset=c('dataset1', 'dataset2', 'dataset3', 'dataset4', 'dataset5'), norm=norm, fil=fil, dimred=dimred, dims=dims, graph=graph, cluster=cluster, stringsAsFactors = FALSE) 
df.para1[1:3, ]
##    dataset norm  fil dimred dims graph cluster
## 1 dataset1  FCT fil1    PCA    5   knn      wt
## 2 dataset2  FCT fil1    PCA    5   knn      wt
## 3 dataset3  FCT fil1    PCA    5   knn      wt
Table 3: Setting combinations in the first phase.
dataset norm fil dimred dims graph cluster
dataset1 FCT fil1 PCA 5 knn wt
dataset2 FCT fil1 PCA 5 knn wt
dataset3 FCT fil1 PCA 5 knn wt
dataset4 FCT fil1 PCA 5 knn wt
dataset5 FCT fil1 PCA 5 knn wt
dataset1 FCT fil1 UMAP 5 knn wt
dataset2 FCT fil1 UMAP 5 knn wt
dataset3 FCT fil1 UMAP 5 knn wt
dataset4 FCT fil1 UMAP 5 knn wt
dataset5 FCT fil1 UMAP 5 knn wt
dataset1 FCT fil1 PCA 20 knn wt
dataset2 FCT fil1 PCA 20 knn wt
dataset3 FCT fil1 PCA 20 knn wt
dataset4 FCT fil1 PCA 20 knn wt
dataset5 FCT fil1 PCA 20 knn wt
dataset1 FCT fil1 UMAP 20 knn wt
dataset2 FCT fil1 UMAP 20 knn wt
dataset3 FCT fil1 UMAP 20 knn wt
dataset4 FCT fil1 UMAP 20 knn wt
dataset5 FCT fil1 UMAP 20 knn wt
dataset1 FCT fil1 PCA 35 knn wt
dataset2 FCT fil1 PCA 35 knn wt
dataset3 FCT fil1 PCA 35 knn wt
dataset4 FCT fil1 PCA 35 knn wt
dataset5 FCT fil1 PCA 35 knn wt
dataset1 FCT fil1 UMAP 35 knn wt
dataset2 FCT fil1 UMAP 35 knn wt
dataset3 FCT fil1 UMAP 35 knn wt
dataset4 FCT fil1 UMAP 35 knn wt
dataset5 FCT fil1 UMAP 35 knn wt
dataset1 FCT fil1 PCA 50 knn wt
dataset2 FCT fil1 PCA 50 knn wt
dataset3 FCT fil1 PCA 50 knn wt
dataset4 FCT fil1 PCA 50 knn wt
dataset5 FCT fil1 PCA 50 knn wt
dataset1 FCT fil1 UMAP 50 knn wt
dataset2 FCT fil1 UMAP 50 knn wt
dataset3 FCT fil1 UMAP 50 knn wt
dataset4 FCT fil1 UMAP 50 knn wt
dataset5 FCT fil1 UMAP 50 knn wt
dataset1 FCT fil1 PCA 65 knn wt
dataset2 FCT fil1 PCA 65 knn wt
dataset3 FCT fil1 PCA 65 knn wt
dataset4 FCT fil1 PCA 65 knn wt
dataset5 FCT fil1 PCA 65 knn wt
dataset1 FCT fil1 UMAP 65 knn wt
dataset2 FCT fil1 UMAP 65 knn wt
dataset3 FCT fil1 UMAP 65 knn wt
dataset4 FCT fil1 UMAP 65 knn wt
dataset5 FCT fil1 UMAP 65 knn wt
dataset1 FCT fil1 PCA 80 knn wt
dataset2 FCT fil1 PCA 80 knn wt
dataset3 FCT fil1 PCA 80 knn wt
dataset4 FCT fil1 PCA 80 knn wt
dataset5 FCT fil1 PCA 80 knn wt
dataset1 FCT fil1 UMAP 80 knn wt
dataset2 FCT fil1 UMAP 80 knn wt
dataset3 FCT fil1 UMAP 80 knn wt
dataset4 FCT fil1 UMAP 80 knn wt
dataset5 FCT fil1 UMAP 80 knn wt
dataset1 FCT fil1 PCA 5 snn wt
dataset2 FCT fil1 PCA 5 snn wt
dataset3 FCT fil1 PCA 5 snn wt
dataset4 FCT fil1 PCA 5 snn wt
dataset5 FCT fil1 PCA 5 snn wt
dataset1 FCT fil1 UMAP 5 snn wt
dataset2 FCT fil1 UMAP 5 snn wt
dataset3 FCT fil1 UMAP 5 snn wt
dataset4 FCT fil1 UMAP 5 snn wt
dataset5 FCT fil1 UMAP 5 snn wt
dataset1 FCT fil1 PCA 20 snn wt
dataset2 FCT fil1 PCA 20 snn wt
dataset3 FCT fil1 PCA 20 snn wt
dataset4 FCT fil1 PCA 20 snn wt
dataset5 FCT fil1 PCA 20 snn wt
dataset1 FCT fil1 UMAP 20 snn wt
dataset2 FCT fil1 UMAP 20 snn wt
dataset3 FCT fil1 UMAP 20 snn wt
dataset4 FCT fil1 UMAP 20 snn wt
dataset5 FCT fil1 UMAP 20 snn wt
dataset1 FCT fil1 PCA 35 snn wt
dataset2 FCT fil1 PCA 35 snn wt
dataset3 FCT fil1 PCA 35 snn wt
dataset4 FCT fil1 PCA 35 snn wt
dataset5 FCT fil1 PCA 35 snn wt
dataset1 FCT fil1 UMAP 35 snn wt
dataset2 FCT fil1 UMAP 35 snn wt
dataset3 FCT fil1 UMAP 35 snn wt
dataset4 FCT fil1 UMAP 35 snn wt
dataset5 FCT fil1 UMAP 35 snn wt
dataset1 FCT fil1 PCA 50 snn wt
dataset2 FCT fil1 PCA 50 snn wt
dataset3 FCT fil1 PCA 50 snn wt
dataset4 FCT fil1 PCA 50 snn wt
dataset5 FCT fil1 PCA 50 snn wt
dataset1 FCT fil1 UMAP 50 snn wt
dataset2 FCT fil1 UMAP 50 snn wt
dataset3 FCT fil1 UMAP 50 snn wt
dataset4 FCT fil1 UMAP 50 snn wt
dataset5 FCT fil1 UMAP 50 snn wt
dataset1 FCT fil1 PCA 65 snn wt
dataset2 FCT fil1 PCA 65 snn wt
dataset3 FCT fil1 PCA 65 snn wt
dataset4 FCT fil1 PCA 65 snn wt
dataset5 FCT fil1 PCA 65 snn wt
dataset1 FCT fil1 UMAP 65 snn wt
dataset2 FCT fil1 UMAP 65 snn wt
dataset3 FCT fil1 UMAP 65 snn wt
dataset4 FCT fil1 UMAP 65 snn wt
dataset5 FCT fil1 UMAP 65 snn wt
dataset1 FCT fil1 PCA 80 snn wt
dataset2 FCT fil1 PCA 80 snn wt
dataset3 FCT fil1 PCA 80 snn wt
dataset4 FCT fil1 PCA 80 snn wt
dataset5 FCT fil1 PCA 80 snn wt
dataset1 FCT fil1 UMAP 80 snn wt
dataset2 FCT fil1 UMAP 80 snn wt
dataset3 FCT fil1 UMAP 80 snn wt
dataset4 FCT fil1 UMAP 80 snn wt
dataset5 FCT fil1 UMAP 80 snn wt
dataset1 FCT fil1 PCA 5 knn fg
dataset2 FCT fil1 PCA 5 knn fg
dataset3 FCT fil1 PCA 5 knn fg
dataset4 FCT fil1 PCA 5 knn fg
dataset5 FCT fil1 PCA 5 knn fg
dataset1 FCT fil1 UMAP 5 knn fg
dataset2 FCT fil1 UMAP 5 knn fg
dataset3 FCT fil1 UMAP 5 knn fg
dataset4 FCT fil1 UMAP 5 knn fg
dataset5 FCT fil1 UMAP 5 knn fg
dataset1 FCT fil1 PCA 20 knn fg
dataset2 FCT fil1 PCA 20 knn fg
dataset3 FCT fil1 PCA 20 knn fg
dataset4 FCT fil1 PCA 20 knn fg
dataset5 FCT fil1 PCA 20 knn fg
dataset1 FCT fil1 UMAP 20 knn fg
dataset2 FCT fil1 UMAP 20 knn fg
dataset3 FCT fil1 UMAP 20 knn fg
dataset4 FCT fil1 UMAP 20 knn fg
dataset5 FCT fil1 UMAP 20 knn fg
dataset1 FCT fil1 PCA 35 knn fg
dataset2 FCT fil1 PCA 35 knn fg
dataset3 FCT fil1 PCA 35 knn fg
dataset4 FCT fil1 PCA 35 knn fg
dataset5 FCT fil1 PCA 35 knn fg
dataset1 FCT fil1 UMAP 35 knn fg
dataset2 FCT fil1 UMAP 35 knn fg
dataset3 FCT fil1 UMAP 35 knn fg
dataset4 FCT fil1 UMAP 35 knn fg
dataset5 FCT fil1 UMAP 35 knn fg
dataset1 FCT fil1 PCA 50 knn fg
dataset2 FCT fil1 PCA 50 knn fg
dataset3 FCT fil1 PCA 50 knn fg
dataset4 FCT fil1 PCA 50 knn fg
dataset5 FCT fil1 PCA 50 knn fg
dataset1 FCT fil1 UMAP 50 knn fg
dataset2 FCT fil1 UMAP 50 knn fg
dataset3 FCT fil1 UMAP 50 knn fg
dataset4 FCT fil1 UMAP 50 knn fg
dataset5 FCT fil1 UMAP 50 knn fg
dataset1 FCT fil1 PCA 65 knn fg
dataset2 FCT fil1 PCA 65 knn fg
dataset3 FCT fil1 PCA 65 knn fg
dataset4 FCT fil1 PCA 65 knn fg
dataset5 FCT fil1 PCA 65 knn fg
dataset1 FCT fil1 UMAP 65 knn fg
dataset2 FCT fil1 UMAP 65 knn fg
dataset3 FCT fil1 UMAP 65 knn fg
dataset4 FCT fil1 UMAP 65 knn fg
dataset5 FCT fil1 UMAP 65 knn fg
dataset1 FCT fil1 PCA 80 knn fg
dataset2 FCT fil1 PCA 80 knn fg
dataset3 FCT fil1 PCA 80 knn fg
dataset4 FCT fil1 PCA 80 knn fg
dataset5 FCT fil1 PCA 80 knn fg
dataset1 FCT fil1 UMAP 80 knn fg
dataset2 FCT fil1 UMAP 80 knn fg
dataset3 FCT fil1 UMAP 80 knn fg
dataset4 FCT fil1 UMAP 80 knn fg
dataset5 FCT fil1 UMAP 80 knn fg
dataset1 FCT fil1 PCA 5 snn fg
dataset2 FCT fil1 PCA 5 snn fg
dataset3 FCT fil1 PCA 5 snn fg
dataset4 FCT fil1 PCA 5 snn fg
dataset5 FCT fil1 PCA 5 snn fg
dataset1 FCT fil1 UMAP 5 snn fg
dataset2 FCT fil1 UMAP 5 snn fg
dataset3 FCT fil1 UMAP 5 snn fg
dataset4 FCT fil1 UMAP 5 snn fg
dataset5 FCT fil1 UMAP 5 snn fg
dataset1 FCT fil1 PCA 20 snn fg
dataset2 FCT fil1 PCA 20 snn fg
dataset3 FCT fil1 PCA 20 snn fg
dataset4 FCT fil1 PCA 20 snn fg
dataset5 FCT fil1 PCA 20 snn fg
dataset1 FCT fil1 UMAP 20 snn fg
dataset2 FCT fil1 UMAP 20 snn fg
dataset3 FCT fil1 UMAP 20 snn fg
dataset4 FCT fil1 UMAP 20 snn fg
dataset5 FCT fil1 UMAP 20 snn fg
dataset1 FCT fil1 PCA 35 snn fg
dataset2 FCT fil1 PCA 35 snn fg
dataset3 FCT fil1 PCA 35 snn fg
dataset4 FCT fil1 PCA 35 snn fg
dataset5 FCT fil1 PCA 35 snn fg
dataset1 FCT fil1 UMAP 35 snn fg
dataset2 FCT fil1 UMAP 35 snn fg
dataset3 FCT fil1 UMAP 35 snn fg
dataset4 FCT fil1 UMAP 35 snn fg
dataset5 FCT fil1 UMAP 35 snn fg
dataset1 FCT fil1 PCA 50 snn fg
dataset2 FCT fil1 PCA 50 snn fg
dataset3 FCT fil1 PCA 50 snn fg
dataset4 FCT fil1 PCA 50 snn fg
dataset5 FCT fil1 PCA 50 snn fg
dataset1 FCT fil1 UMAP 50 snn fg
dataset2 FCT fil1 UMAP 50 snn fg
dataset3 FCT fil1 UMAP 50 snn fg
dataset4 FCT fil1 UMAP 50 snn fg
dataset5 FCT fil1 UMAP 50 snn fg
dataset1 FCT fil1 PCA 65 snn fg
dataset2 FCT fil1 PCA 65 snn fg
dataset3 FCT fil1 PCA 65 snn fg
dataset4 FCT fil1 PCA 65 snn fg
dataset5 FCT fil1 PCA 65 snn fg
dataset1 FCT fil1 UMAP 65 snn fg
dataset2 FCT fil1 UMAP 65 snn fg
dataset3 FCT fil1 UMAP 65 snn fg
dataset4 FCT fil1 UMAP 65 snn fg
dataset5 FCT fil1 UMAP 65 snn fg
dataset1 FCT fil1 PCA 80 snn fg
dataset2 FCT fil1 PCA 80 snn fg
dataset3 FCT fil1 PCA 80 snn fg
dataset4 FCT fil1 PCA 80 snn fg
dataset5 FCT fil1 PCA 80 snn fg
dataset1 FCT fil1 UMAP 80 snn fg
dataset2 FCT fil1 UMAP 80 snn fg
dataset3 FCT fil1 UMAP 80 snn fg
dataset4 FCT fil1 UMAP 80 snn fg
dataset5 FCT fil1 UMAP 80 snn fg
dataset1 FCT fil1 PCA 5 knn le
dataset2 FCT fil1 PCA 5 knn le
dataset3 FCT fil1 PCA 5 knn le
dataset4 FCT fil1 PCA 5 knn le
dataset5 FCT fil1 PCA 5 knn le
dataset1 FCT fil1 UMAP 5 knn le
dataset2 FCT fil1 UMAP 5 knn le
dataset3 FCT fil1 UMAP 5 knn le
dataset4 FCT fil1 UMAP 5 knn le
dataset5 FCT fil1 UMAP 5 knn le
dataset1 FCT fil1 PCA 20 knn le
dataset2 FCT fil1 PCA 20 knn le
dataset3 FCT fil1 PCA 20 knn le
dataset4 FCT fil1 PCA 20 knn le
dataset5 FCT fil1 PCA 20 knn le
dataset1 FCT fil1 UMAP 20 knn le
dataset2 FCT fil1 UMAP 20 knn le
dataset3 FCT fil1 UMAP 20 knn le
dataset4 FCT fil1 UMAP 20 knn le
dataset5 FCT fil1 UMAP 20 knn le
dataset1 FCT fil1 PCA 35 knn le
dataset2 FCT fil1 PCA 35 knn le
dataset3 FCT fil1 PCA 35 knn le
dataset4 FCT fil1 PCA 35 knn le
dataset5 FCT fil1 PCA 35 knn le
dataset1 FCT fil1 UMAP 35 knn le
dataset2 FCT fil1 UMAP 35 knn le
dataset3 FCT fil1 UMAP 35 knn le
dataset4 FCT fil1 UMAP 35 knn le
dataset5 FCT fil1 UMAP 35 knn le
dataset1 FCT fil1 PCA 50 knn le
dataset2 FCT fil1 PCA 50 knn le
dataset3 FCT fil1 PCA 50 knn le
dataset4 FCT fil1 PCA 50 knn le
dataset5 FCT fil1 PCA 50 knn le
dataset1 FCT fil1 UMAP 50 knn le
dataset2 FCT fil1 UMAP 50 knn le
dataset3 FCT fil1 UMAP 50 knn le
dataset4 FCT fil1 UMAP 50 knn le
dataset5 FCT fil1 UMAP 50 knn le
dataset1 FCT fil1 PCA 65 knn le
dataset2 FCT fil1 PCA 65 knn le
dataset3 FCT fil1 PCA 65 knn le
dataset4 FCT fil1 PCA 65 knn le
dataset5 FCT fil1 PCA 65 knn le
dataset1 FCT fil1 UMAP 65 knn le
dataset2 FCT fil1 UMAP 65 knn le
dataset3 FCT fil1 UMAP 65 knn le
dataset4 FCT fil1 UMAP 65 knn le
dataset5 FCT fil1 UMAP 65 knn le
dataset1 FCT fil1 PCA 80 knn le
dataset2 FCT fil1 PCA 80 knn le
dataset3 FCT fil1 PCA 80 knn le
dataset4 FCT fil1 PCA 80 knn le
dataset5 FCT fil1 PCA 80 knn le
dataset1 FCT fil1 UMAP 80 knn le
dataset2 FCT fil1 UMAP 80 knn le
dataset3 FCT fil1 UMAP 80 knn le
dataset4 FCT fil1 UMAP 80 knn le
dataset5 FCT fil1 UMAP 80 knn le
dataset1 FCT fil1 PCA 5 snn le
dataset2 FCT fil1 PCA 5 snn le
dataset3 FCT fil1 PCA 5 snn le
dataset4 FCT fil1 PCA 5 snn le
dataset5 FCT fil1 PCA 5 snn le
dataset1 FCT fil1 UMAP 5 snn le
dataset2 FCT fil1 UMAP 5 snn le
dataset3 FCT fil1 UMAP 5 snn le
dataset4 FCT fil1 UMAP 5 snn le
dataset5 FCT fil1 UMAP 5 snn le
dataset1 FCT fil1 PCA 20 snn le
dataset2 FCT fil1 PCA 20 snn le
dataset3 FCT fil1 PCA 20 snn le
dataset4 FCT fil1 PCA 20 snn le
dataset5 FCT fil1 PCA 20 snn le
dataset1 FCT fil1 UMAP 20 snn le
dataset2 FCT fil1 UMAP 20 snn le
dataset3 FCT fil1 UMAP 20 snn le
dataset4 FCT fil1 UMAP 20 snn le
dataset5 FCT fil1 UMAP 20 snn le
dataset1 FCT fil1 PCA 35 snn le
dataset2 FCT fil1 PCA 35 snn le
dataset3 FCT fil1 PCA 35 snn le
dataset4 FCT fil1 PCA 35 snn le
dataset5 FCT fil1 PCA 35 snn le
dataset1 FCT fil1 UMAP 35 snn le
dataset2 FCT fil1 UMAP 35 snn le
dataset3 FCT fil1 UMAP 35 snn le
dataset4 FCT fil1 UMAP 35 snn le
dataset5 FCT fil1 UMAP 35 snn le
dataset1 FCT fil1 PCA 50 snn le
dataset2 FCT fil1 PCA 50 snn le
dataset3 FCT fil1 PCA 50 snn le
dataset4 FCT fil1 PCA 50 snn le
dataset5 FCT fil1 PCA 50 snn le
dataset1 FCT fil1 UMAP 50 snn le
dataset2 FCT fil1 UMAP 50 snn le
dataset3 FCT fil1 UMAP 50 snn le
dataset4 FCT fil1 UMAP 50 snn le
dataset5 FCT fil1 UMAP 50 snn le
dataset1 FCT fil1 PCA 65 snn le
dataset2 FCT fil1 PCA 65 snn le
dataset3 FCT fil1 PCA 65 snn le
dataset4 FCT fil1 PCA 65 snn le
dataset5 FCT fil1 PCA 65 snn le
dataset1 FCT fil1 UMAP 65 snn le
dataset2 FCT fil1 UMAP 65 snn le
dataset3 FCT fil1 UMAP 65 snn le
dataset4 FCT fil1 UMAP 65 snn le
dataset5 FCT fil1 UMAP 65 snn le
dataset1 FCT fil1 PCA 80 snn le
dataset2 FCT fil1 PCA 80 snn le
dataset3 FCT fil1 PCA 80 snn le
dataset4 FCT fil1 PCA 80 snn le
dataset5 FCT fil1 PCA 80 snn le
dataset1 FCT fil1 UMAP 80 snn le
dataset2 FCT fil1 UMAP 80 snn le
dataset3 FCT fil1 UMAP 80 snn le
dataset4 FCT fil1 UMAP 80 snn le
dataset5 FCT fil1 UMAP 80 snn le

The co-clustering workflow (Figure 1) is iteratively run with each combination in Table 3 by calling coclus_opt. To reduce runtime, these runs are parallelized with MulticoreParam from the BiocParallel package (Morgan et al. 2021). The results are are shown in Table 4 and saved in the directory result/opt/init.

df.res1 <- coclus_opt(dat.lis, df.para1, df.fil.set, multi.core.par=MulticoreParam(workers=6, RNGseed=50), wk.dir='result/opt/init')
df.res1[1:5, ]
Table 4: Results of the first phase.
dataset norm fil dimred dims graph cluster accuracy specificity sensitivity threshold true.assignment total.assignment auc
dataset1 FCT fil1 PCA 5 knn wt 0.696 0.417 0.893 0.250 822 1402 0.639
dataset2 FCT fil1 PCA 5 knn wt 0.791 0.825 0.782 0.750 688 877 0.783
dataset3 FCT fil1 PCA 5 knn wt 0.599 0.435 0.979 -0.350 1075 3560 0.752
dataset4 FCT fil1 PCA 5 knn wt 0.750 0.703 0.793 0.550 1364 2643 0.743
dataset5 FCT fil1 PCA 5 knn wt 0.702 0.809 0.654 0.850 1209 1747 0.788
dataset1 FCT fil1 UMAP 5 knn wt 0.000 0.000 0.000 0.000 0 0 0.000
dataset2 FCT fil1 UMAP 5 knn wt 0.607 0.403 0.697 0.950 175 252 0.527
dataset3 FCT fil1 UMAP 5 knn wt 0.625 0.027 0.995 0.850 601 973 0.505
dataset4 FCT fil1 UMAP 5 knn wt 0.000 0.000 0.000 0.000 0 0 0.000
dataset5 FCT fil1 UMAP 5 knn wt 0.536 0.164 0.916 0.950 285 577 0.540
dataset1 FCT fil1 PCA 20 knn wt 0.715 0.646 0.806 0.267 697 1636 0.783
dataset2 FCT fil1 PCA 20 knn wt 0.533 0.439 0.818 0.134 362 1460 0.677
dataset3 FCT fil1 PCA 20 knn wt 0.682 0.576 0.763 0.216 1327 2324 0.702
dataset4 FCT fil1 PCA 20 knn wt 0.483 0.792 0.334 0.418 1081 1605 0.484
dataset5 FCT fil1 PCA 20 knn wt 0.727 0.430 0.823 0.238 1354 1787 0.588
dataset1 FCT fil1 UMAP 20 knn wt 0.000 0.000 0.000 0.000 0 0 0.000
dataset2 FCT fil1 UMAP 20 knn wt 0.655 0.538 0.664 0.981 333 359 0.539
dataset3 FCT fil1 UMAP 20 knn wt 0.598 0.564 0.609 0.984 432 572 0.573
dataset4 FCT fil1 UMAP 20 knn wt 0.713 0.358 0.770 0.982 421 488 0.505
dataset5 FCT fil1 UMAP 20 knn wt 0.442 0.915 0.247 0.998 458 647 0.528
dataset1 FCT fil1 PCA 35 knn wt 0.613 0.627 0.599 0.150 693 1379 0.647
dataset2 FCT fil1 PCA 35 knn wt 0.715 0.842 0.630 0.118 860 1431 0.796
dataset3 FCT fil1 PCA 35 knn wt 0.463 0.306 0.841 0.040 466 1588 0.597
dataset4 FCT fil1 PCA 35 knn wt 0.631 0.690 0.591 0.218 946 1601 0.683
dataset5 FCT fil1 PCA 35 knn wt 0.812 0.856 0.779 0.218 1018 1768 0.861
dataset1 FCT fil1 UMAP 35 knn wt 0.000 0.000 0.000 0.000 0 0 0.000
dataset2 FCT fil1 UMAP 35 knn wt 0.100 1.000 0.006 1.000 344 380 0.312
dataset3 FCT fil1 UMAP 35 knn wt 0.895 0.500 0.941 0.996 34 38 0.559
dataset4 FCT fil1 UMAP 35 knn wt 0.149 0.977 0.037 1.000 325 369 0.225
dataset5 FCT fil1 UMAP 35 knn wt 0.644 0.366 0.848 0.954 554 961 0.567
dataset1 FCT fil1 PCA 50 knn wt 0.718 0.727 0.696 0.206 441 1529 0.778
dataset2 FCT fil1 PCA 50 knn wt 0.680 0.603 0.706 0.007 1277 1703 0.712
dataset3 FCT fil1 PCA 50 knn wt 0.739 0.816 0.420 0.222 283 1461 0.633
dataset4 FCT fil1 PCA 50 knn wt 0.632 0.604 0.657 0.164 583 1085 0.666
dataset5 FCT fil1 PCA 50 knn wt 0.631 0.712 0.621 0.260 945 1063 0.656
dataset1 FCT fil1 UMAP 50 knn wt 0.893 0.071 0.977 0.946 554 610 0.294
dataset2 FCT fil1 UMAP 50 knn wt 0.984 1.000 0.984 0.966 184 185 0.984
dataset3 FCT fil1 UMAP 50 knn wt 0.784 1.000 0.765 0.998 34 37 0.897
dataset4 FCT fil1 UMAP 50 knn wt 0.000 0.000 0.000 0.000 0 0 0.000
dataset5 FCT fil1 UMAP 50 knn wt 0.577 0.631 0.558 0.990 448 608 0.581
dataset1 FCT fil1 PCA 65 knn wt 0.744 0.794 0.621 0.196 319 1115 0.776
dataset2 FCT fil1 PCA 65 knn wt 0.650 0.573 0.809 0.062 423 1302 0.761
dataset3 FCT fil1 PCA 65 knn wt 0.541 0.479 0.796 0.068 270 1391 0.693
dataset4 FCT fil1 PCA 65 knn wt 0.527 0.794 0.395 0.216 648 968 0.619
dataset5 FCT fil1 PCA 65 knn wt 0.681 0.846 0.432 0.214 774 1941 0.649
dataset1 FCT fil1 UMAP 65 knn wt 0.155 1.000 0.044 1.000 319 361 0.208
dataset2 FCT fil1 UMAP 65 knn wt 0.814 0.500 0.835 0.992 121 129 0.581
dataset3 FCT fil1 UMAP 65 knn wt 0.973 0.929 1.000 0.998 23 37 0.958
dataset4 FCT fil1 UMAP 65 knn wt 0.960 0.143 0.981 0.990 267 274 0.422
dataset5 FCT fil1 UMAP 65 knn wt 0.112 0.984 0.029 0.998 655 717 0.218
dataset1 FCT fil1 PCA 80 knn wt 0.658 0.661 0.647 0.124 241 1193 0.704
dataset2 FCT fil1 PCA 80 knn wt 0.683 0.654 0.718 0.104 521 1134 0.750
dataset3 FCT fil1 PCA 80 knn wt 0.689 0.813 0.471 0.365 70 193 0.593
dataset4 FCT fil1 PCA 80 knn wt 0.685 0.831 0.415 0.212 287 815 0.666
dataset5 FCT fil1 PCA 80 knn wt 0.454 0.673 0.418 0.188 899 1046 0.508
dataset1 FCT fil1 UMAP 80 knn wt 0.877 0.022 1.000 0.936 314 359 0.354
dataset2 FCT fil1 UMAP 80 knn wt 0.000 0.000 0.000 0.000 0 0 0.000
dataset3 FCT fil1 UMAP 80 knn wt 0.496 0.833 0.480 0.984 125 131 0.521
dataset4 FCT fil1 UMAP 80 knn wt 0.793 0.095 0.953 0.980 275 338 0.353
dataset5 FCT fil1 UMAP 80 knn wt 0.000 0.000 0.000 0.000 0 0 0.000
dataset1 FCT fil1 PCA 5 snn wt 0.601 0.801 0.557 0.750 1254 1530 0.665
dataset2 FCT fil1 PCA 5 snn wt 0.888 0.812 0.902 0.650 707 845 0.914
dataset3 FCT fil1 PCA 5 snn wt 0.528 0.528 0.527 0.750 1111 2803 0.495
dataset4 FCT fil1 PCA 5 snn wt 0.536 0.790 0.333 0.750 2112 3792 0.495
dataset5 FCT fil1 PCA 5 snn wt 0.651 0.398 0.918 0.350 1033 2121 0.706
dataset1 FCT fil1 UMAP 5 snn wt 0.000 0.000 0.000 0.000 0 0 0.000
dataset2 FCT fil1 UMAP 5 snn wt 0.000 0.000 0.000 0.000 0 0 0.000
dataset3 FCT fil1 UMAP 5 snn wt 0.624 0.053 0.991 0.800 437 718 0.501
dataset4 FCT fil1 UMAP 5 snn wt 0.000 0.000 0.000 0.000 0 0 0.000
dataset5 FCT fil1 UMAP 5 snn wt 0.615 0.581 0.643 0.950 294 540 0.612
dataset1 FCT fil1 PCA 20 snn wt 0.699 0.640 0.789 0.276 722 1828 0.789
dataset2 FCT fil1 PCA 20 snn wt 0.702 0.664 0.721 0.130 1460 2180 0.757
dataset3 FCT fil1 PCA 20 snn wt 0.479 0.860 0.324 0.424 1717 2417 0.609
dataset4 FCT fil1 PCA 20 snn wt 0.611 0.807 0.547 0.360 1755 2321 0.705
dataset5 FCT fil1 PCA 20 snn wt 0.750 0.738 0.760 0.338 883 1588 0.816
dataset1 FCT fil1 UMAP 20 snn wt 0.825 0.024 0.988 0.790 418 503 0.271
dataset2 FCT fil1 UMAP 20 snn wt 0.923 0.182 0.992 0.988 119 130 0.599
dataset3 FCT fil1 UMAP 20 snn wt 0.647 0.965 0.500 0.986 308 451 0.723
dataset4 FCT fil1 UMAP 20 snn wt 0.000 0.000 0.000 0.000 0 0 0.000
dataset5 FCT fil1 UMAP 20 snn wt 0.344 0.884 0.155 0.996 660 892 0.373
dataset1 FCT fil1 PCA 35 snn wt 0.706 0.646 0.770 0.228 989 2049 0.776
dataset2 FCT fil1 PCA 35 snn wt 0.573 0.858 0.454 0.218 1282 1818 0.715
dataset3 FCT fil1 PCA 35 snn wt 0.627 0.519 0.731 0.126 1379 2692 0.673
dataset4 FCT fil1 PCA 35 snn wt 0.621 0.698 0.569 0.258 1087 1815 0.629
dataset5 FCT fil1 PCA 35 snn wt 0.693 0.664 0.723 0.230 782 1561 0.736
dataset1 FCT fil1 UMAP 35 snn wt 0.441 0.852 0.415 0.994 424 451 0.472
dataset2 FCT fil1 UMAP 35 snn wt 0.248 1.000 0.218 1.000 124 129 0.439
dataset3 FCT fil1 UMAP 35 snn wt 0.500 1.000 0.471 1.000 34 36 0.691
dataset4 FCT fil1 UMAP 35 snn wt 0.151 1.000 0.058 0.998 394 437 0.238
dataset5 FCT fil1 UMAP 35 snn wt 0.531 0.685 0.445 0.992 618 964 0.562
dataset1 FCT fil1 PCA 50 snn wt 0.644 0.785 0.450 0.202 678 1613 0.652
dataset2 FCT fil1 PCA 50 snn wt 0.692 0.639 0.749 0.082 1119 2326 0.756
dataset3 FCT fil1 PCA 50 snn wt 0.555 0.412 0.809 0.064 629 1741 0.649
dataset4 FCT fil1 PCA 50 snn wt 0.675 0.784 0.639 0.182 894 1195 0.771
dataset5 FCT fil1 PCA 50 snn wt 0.639 0.725 0.575 0.186 1274 2229 0.708
dataset1 FCT fil1 UMAP 50 snn wt 0.931 0.045 0.997 0.938 295 317 0.305
dataset2 FCT fil1 UMAP 50 snn wt 0.627 1.000 0.625 0.980 269 271 0.717
dataset3 FCT fil1 UMAP 50 snn wt 0.784 0.667 0.794 0.998 34 37 0.672
dataset4 FCT fil1 UMAP 50 snn wt 0.138 0.968 0.046 1.000 280 311 0.276
dataset5 FCT fil1 UMAP 50 snn wt 0.346 0.933 0.117 0.996 539 749 0.425
dataset1 FCT fil1 PCA 65 snn wt 0.686 0.741 0.606 0.186 766 1894 0.725
dataset2 FCT fil1 PCA 65 snn wt 0.673 0.663 0.687 0.094 873 2147 0.749
dataset3 FCT fil1 PCA 65 snn wt 0.303 0.993 0.019 0.625 324 458 0.311
dataset4 FCT fil1 PCA 65 snn wt 0.498 0.669 0.400 0.180 874 1375 0.515
dataset5 FCT fil1 PCA 65 snn wt 0.639 0.710 0.587 0.156 1037 1793 0.684
dataset1 FCT fil1 UMAP 65 snn wt 0.244 1.000 0.085 1.000 213 258 0.412
dataset2 FCT fil1 UMAP 65 snn wt 0.883 0.455 0.902 0.958 254 265 0.506
dataset3 FCT fil1 UMAP 65 snn wt 0.649 1.000 0.618 0.998 34 37 0.838
dataset4 FCT fil1 UMAP 65 snn wt 0.000 0.000 0.000 0.000 0 0 0.000
dataset5 FCT fil1 UMAP 65 snn wt 0.788 0.089 0.959 0.956 641 798 0.349
dataset1 FCT fil1 PCA 80 snn wt 0.734 0.829 0.515 0.194 402 1336 0.731
dataset2 FCT fil1 PCA 80 snn wt 0.811 0.879 0.577 0.196 366 1638 0.773
dataset3 FCT fil1 PCA 80 snn wt 0.849 0.886 0.533 0.222 182 1706 0.748
dataset4 FCT fil1 PCA 80 snn wt 0.625 0.540 0.717 0.114 498 1039 0.653
dataset5 FCT fil1 PCA 80 snn wt 0.599 0.721 0.474 0.150 703 1433 0.626
dataset1 FCT fil1 UMAP 80 snn wt 0.819 0.017 1.000 0.803 266 326 0.314
dataset2 FCT fil1 UMAP 80 snn wt 0.000 0.000 0.000 0.000 0 0 0.000
dataset3 FCT fil1 UMAP 80 snn wt 0.000 0.000 0.000 0.000 0 0 0.000
dataset4 FCT fil1 UMAP 80 snn wt 0.699 0.333 0.711 0.972 263 272 0.348
dataset5 FCT fil1 UMAP 80 snn wt 0.891 0.582 0.966 0.918 503 625 0.677
dataset1 FCT fil1 PCA 5 knn fg 0.581 0.487 0.695 0.550 2382 5269 0.610
dataset2 FCT fil1 PCA 5 knn fg 0.593 0.742 0.469 0.750 1821 3339 0.566
dataset3 FCT fil1 PCA 5 knn fg 0.633 0.188 0.891 0.250 5649 8926 0.504
dataset4 FCT fil1 PCA 5 knn fg 0.554 0.816 0.356 0.750 2612 4580 0.565
dataset5 FCT fil1 PCA 5 knn fg 0.606 0.433 0.840 0.450 1804 4236 0.646
dataset1 FCT fil1 UMAP 5 knn fg 0.912 0.079 0.984 0.850 1041 1130 0.492
dataset2 FCT fil1 UMAP 5 knn fg 0.000 0.000 0.000 0.000 0 0 0.000
dataset3 FCT fil1 UMAP 5 knn fg 0.559 0.029 0.996 0.800 1681 3068 0.468
dataset4 FCT fil1 UMAP 5 knn fg 0.573 0.510 0.604 0.950 1509 2250 0.557
dataset5 FCT fil1 UMAP 5 knn fg 0.000 0.000 0.000 0.000 0 0 0.000
dataset1 FCT fil1 PCA 20 knn fg 0.738 0.712 0.787 0.260 1767 5213 0.825
dataset2 FCT fil1 PCA 20 knn fg 0.433 0.907 0.142 0.710 1109 1789 0.460
dataset3 FCT fil1 PCA 20 knn fg 0.545 0.981 0.026 0.766 1645 3605 0.428
dataset4 FCT fil1 PCA 20 knn fg 0.762 0.465 0.834 0.074 3472 4315 0.671
dataset5 FCT fil1 PCA 20 knn fg 0.737 0.705 0.747 0.218 3819 5024 0.777
dataset1 FCT fil1 UMAP 20 knn fg 0.836 0.673 0.910 0.841 1747 2531 0.789
dataset2 FCT fil1 UMAP 20 knn fg 0.908 0.823 0.934 0.794 914 1196 0.905
dataset3 FCT fil1 UMAP 20 knn fg 0.149 0.992 0.037 0.998 950 1077 0.257
dataset4 FCT fil1 UMAP 20 knn fg 0.833 0.182 0.841 0.645 953 964 0.276
dataset5 FCT fil1 UMAP 20 knn fg 0.834 0.363 0.976 0.643 1812 2358 0.600
dataset1 FCT fil1 PCA 35 knn fg 0.751 0.527 0.810 0.141 3171 4006 0.681
dataset2 FCT fil1 PCA 35 knn fg 0.711 0.759 0.568 0.162 711 2812 0.731
dataset3 FCT fil1 PCA 35 knn fg 0.518 0.519 0.516 0.130 1441 4265 0.503
dataset4 FCT fil1 PCA 35 knn fg 0.726 0.423 0.785 0.034 3620 4322 0.616
dataset5 FCT fil1 PCA 35 knn fg 0.817 0.167 0.905 0.130 1670 1897 0.506
dataset1 FCT fil1 UMAP 35 knn fg 0.826 0.078 0.971 0.909 665 794 0.420
dataset2 FCT fil1 UMAP 35 knn fg 0.458 1.000 0.305 0.996 197 253 0.544
dataset3 FCT fil1 UMAP 35 knn fg 0.972 1.000 0.957 1.000 23 36 0.982
dataset4 FCT fil1 UMAP 35 knn fg 0.750 0.834 0.623 0.964 517 1304 0.780
dataset5 FCT fil1 UMAP 35 knn fg 0.787 0.610 0.916 0.931 1028 1776 0.764
dataset1 FCT fil1 PCA 50 knn fg 0.706 0.468 0.773 0.048 3193 4090 0.612
dataset2 FCT fil1 PCA 50 knn fg 0.647 0.537 0.755 0.130 641 1268 0.668
dataset3 FCT fil1 PCA 50 knn fg 0.623 0.286 0.914 0.021 1126 2095 0.575
dataset4 FCT fil1 PCA 50 knn fg 0.538 0.867 0.475 0.268 1515 1808 0.681
dataset5 FCT fil1 PCA 50 knn fg 0.672 0.738 0.633 0.120 2845 4511 0.733
dataset1 FCT fil1 UMAP 50 knn fg 0.530 0.567 0.525 0.966 1263 1450 0.476
dataset2 FCT fil1 UMAP 50 knn fg 0.825 1.000 0.824 0.968 239 240 0.828
dataset3 FCT fil1 UMAP 50 knn fg 0.946 0.667 0.971 0.998 34 37 0.843
dataset4 FCT fil1 UMAP 50 knn fg 0.701 0.652 0.788 0.963 250 690 0.736
dataset5 FCT fil1 UMAP 50 knn fg 0.671 0.781 0.585 0.942 1121 2002 0.736
dataset1 FCT fil1 PCA 65 knn fg 0.576 0.340 0.864 0.012 1550 3438 0.616
dataset2 FCT fil1 PCA 65 knn fg 0.657 0.553 0.770 0.110 552 1152 0.674
dataset3 FCT fil1 PCA 65 knn fg 0.522 0.857 0.227 0.264 1425 2674 0.541
dataset4 FCT fil1 PCA 65 knn fg 0.584 0.731 0.554 0.202 1532 1848 0.665
dataset5 FCT fil1 PCA 65 knn fg 0.656 0.666 0.653 0.090 3707 4725 0.699
dataset1 FCT fil1 UMAP 65 knn fg 0.697 0.285 0.890 0.823 1224 1796 0.557
dataset2 FCT fil1 UMAP 65 knn fg 0.070 1.000 0.052 1.000 267 272 0.247
dataset3 FCT fil1 UMAP 65 knn fg 0.973 0.929 1.000 1.000 23 37 0.964
dataset4 FCT fil1 UMAP 65 knn fg 0.782 0.714 0.982 0.933 284 1126 0.876
dataset5 FCT fil1 UMAP 65 knn fg 0.599 0.666 0.559 0.954 939 1505 0.627
dataset1 FCT fil1 PCA 80 knn fg 0.643 0.696 0.626 0.080 3303 4358 0.717
dataset2 FCT fil1 PCA 80 knn fg 0.322 0.869 0.246 0.240 1249 1424 0.531
dataset3 FCT fil1 PCA 80 knn fg 0.774 0.843 0.347 0.238 357 2577 0.614
dataset4 FCT fil1 PCA 80 knn fg 0.461 0.770 0.326 0.174 2960 4261 0.554
dataset5 FCT fil1 PCA 80 knn fg 0.618 0.894 0.268 0.238 968 2198 0.579
dataset1 FCT fil1 UMAP 80 knn fg 0.624 0.621 0.626 0.861 719 1059 0.580
dataset2 FCT fil1 UMAP 80 knn fg 0.925 0.811 0.979 0.958 236 347 0.846
dataset3 FCT fil1 UMAP 80 knn fg 0.973 0.929 1.000 0.998 23 37 0.964
dataset4 FCT fil1 UMAP 80 knn fg 0.779 0.678 0.988 0.915 421 1298 0.876
dataset5 FCT fil1 UMAP 80 knn fg 0.000 0.000 0.000 0.000 0 0 0.000
dataset1 FCT fil1 PCA 5 snn fg 0.778 0.568 0.992 0.050 2737 5533 0.730
dataset2 FCT fil1 PCA 5 snn fg 0.582 0.621 0.558 0.750 2962 4814 0.606
dataset3 FCT fil1 PCA 5 snn fg 0.497 0.701 0.364 0.750 4856 8008 0.530
dataset4 FCT fil1 PCA 5 snn fg 0.598 0.599 0.597 0.650 2426 4029 0.587
dataset5 FCT fil1 PCA 5 snn fg 0.646 0.768 0.494 0.750 2361 5286 0.678
dataset1 FCT fil1 UMAP 5 snn fg 0.866 0.720 0.940 0.850 1906 2858 0.822
dataset2 FCT fil1 UMAP 5 snn fg 0.640 0.453 0.662 0.950 1170 1307 0.570
dataset3 FCT fil1 UMAP 5 snn fg 0.314 0.031 0.997 0.750 1536 5240 0.417
dataset4 FCT fil1 UMAP 5 snn fg 0.412 0.149 0.958 0.850 1448 4454 0.429
dataset5 FCT fil1 UMAP 5 snn fg 0.000 0.000 0.000 0.000 0 0 0.000
dataset1 FCT fil1 PCA 20 snn fg 0.699 0.421 0.793 0.208 3721 4986 0.628
dataset2 FCT fil1 PCA 20 snn fg 0.734 0.575 0.781 0.186 4506 5834 0.717
dataset3 FCT fil1 PCA 20 snn fg 0.595 0.371 0.690 0.086 5947 8482 0.531
dataset4 FCT fil1 PCA 20 snn fg 0.530 0.705 0.470 0.364 3246 4342 0.598
dataset5 FCT fil1 PCA 20 snn fg 0.706 0.732 0.694 0.238 3499 5236 0.754
dataset1 FCT fil1 UMAP 20 snn fg 0.828 0.352 0.975 0.792 2184 2857 0.655
dataset2 FCT fil1 UMAP 20 snn fg 0.842 0.651 0.892 0.774 1407 1771 0.832
dataset3 FCT fil1 UMAP 20 snn fg 0.435 0.982 0.039 0.993 1646 2837 0.123
dataset4 FCT fil1 UMAP 20 snn fg 0.531 0.822 0.352 0.951 1633 2638 0.537
dataset5 FCT fil1 UMAP 20 snn fg 0.646 0.446 0.674 0.870 2494 2844 0.488
dataset1 FCT fil1 PCA 35 snn fg 0.662 0.441 0.744 0.120 3784 5184 0.613
dataset2 FCT fil1 PCA 35 snn fg 0.656 0.620 0.709 0.174 1572 3830 0.722
dataset3 FCT fil1 PCA 35 snn fg 0.619 0.348 0.750 0.048 3757 5560 0.552
dataset4 FCT fil1 PCA 35 snn fg 0.561 0.885 0.222 0.318 2081 4261 0.516
dataset5 FCT fil1 PCA 35 snn fg 0.673 0.592 0.710 0.106 3693 5390 0.696
dataset1 FCT fil1 UMAP 35 snn fg 0.799 0.710 0.831 0.865 1683 2289 0.748
dataset2 FCT fil1 UMAP 35 snn fg 0.876 0.580 1.000 0.943 194 275 0.697
dataset3 FCT fil1 UMAP 35 snn fg 0.928 0.929 0.909 0.917 77 1480 0.929
dataset4 FCT fil1 UMAP 35 snn fg 0.742 0.479 0.985 0.724 1075 2072 0.766
dataset5 FCT fil1 UMAP 35 snn fg 0.701 0.470 0.916 0.889 1226 2361 0.727
dataset1 FCT fil1 PCA 50 snn fg 0.631 0.777 0.506 0.214 1660 3072 0.678
dataset2 FCT fil1 PCA 50 snn fg 0.648 0.583 0.772 0.126 942 2740 0.730
dataset3 FCT fil1 PCA 50 snn fg 0.614 0.660 0.578 0.084 3474 6220 0.652
dataset4 FCT fil1 PCA 50 snn fg 0.601 0.606 0.596 0.084 1777 3392 0.617
dataset5 FCT fil1 PCA 50 snn fg 0.655 0.572 0.720 0.072 3033 5390 0.705
dataset1 FCT fil1 UMAP 50 snn fg 0.890 0.170 0.987 0.712 1000 1135 0.383
dataset2 FCT fil1 UMAP 50 snn fg 0.990 0.997 0.978 0.928 184 525 0.987
dataset3 FCT fil1 UMAP 50 snn fg 0.892 0.667 0.912 0.992 34 37 0.765
dataset4 FCT fil1 UMAP 50 snn fg 0.133 1.000 0.041 1.000 339 375 0.277
dataset5 FCT fil1 UMAP 50 snn fg 0.739 0.649 0.801 0.651 1613 2728 0.750
dataset1 FCT fil1 PCA 65 snn fg 0.611 0.571 0.624 0.102 3625 4738 0.634
dataset2 FCT fil1 PCA 65 snn fg 0.676 0.440 0.785 0.076 1384 2023 0.648
dataset3 FCT fil1 PCA 65 snn fg 0.588 0.486 0.672 0.052 3329 6065 0.599
dataset4 FCT fil1 PCA 65 snn fg 0.688 0.752 0.528 0.210 396 1392 0.685
dataset5 FCT fil1 PCA 65 snn fg 0.625 0.844 0.330 0.230 924 2173 0.577
dataset1 FCT fil1 UMAP 65 snn fg 0.075 0.992 0.016 1.000 1915 2038 0.399
dataset2 FCT fil1 UMAP 65 snn fg 0.732 0.800 0.731 0.696 558 563 0.761
dataset3 FCT fil1 UMAP 65 snn fg 0.000 0.000 0.000 0.000 0 0 0.000
dataset4 FCT fil1 UMAP 65 snn fg 0.000 0.000 0.000 0.000 0 0 0.000
dataset5 FCT fil1 UMAP 65 snn fg 0.682 0.687 0.680 0.847 1804 2494 0.646
dataset1 FCT fil1 PCA 80 snn fg 0.616 0.730 0.519 0.138 1658 3078 0.660
dataset2 FCT fil1 PCA 80 snn fg 0.705 0.684 0.722 0.092 1889 3430 0.766
dataset3 FCT fil1 PCA 80 snn fg 0.541 0.348 0.734 0.014 1214 2425 0.521
dataset4 FCT fil1 PCA 80 snn fg 0.681 0.748 0.514 0.184 551 1918 0.655
dataset5 FCT fil1 PCA 80 snn fg 0.561 0.712 0.493 0.136 3141 4545 0.643
dataset1 FCT fil1 UMAP 80 snn fg 0.311 0.862 0.175 0.992 915 1140 0.293
dataset2 FCT fil1 UMAP 80 snn fg 0.798 0.698 0.827 0.835 709 908 0.748
dataset3 FCT fil1 UMAP 80 snn fg 0.973 1.000 0.957 1.000 23 37 0.981
dataset4 FCT fil1 UMAP 80 snn fg 0.000 0.000 0.000 0.000 0 0 0.000
dataset5 FCT fil1 UMAP 80 snn fg 0.000 0.000 0.000 0.000 0 0 0.000
dataset1 FCT fil1 PCA 5 knn le 0.723 0.534 0.771 0.450 1336 1677 0.651
dataset2 FCT fil1 PCA 5 knn le 0.680 0.624 0.697 0.750 1150 1504 0.731
dataset3 FCT fil1 PCA 5 knn le 0.700 0.817 0.554 0.750 1021 2288 0.735
dataset4 FCT fil1 PCA 5 knn le 0.530 0.692 0.432 0.750 1037 1666 0.500
dataset5 FCT fil1 PCA 5 knn le 0.585 0.252 0.834 0.550 813 1421 0.503
dataset1 FCT fil1 UMAP 5 knn le 0.780 0.631 0.829 0.950 533 709 0.730
dataset2 FCT fil1 UMAP 5 knn le 0.679 0.381 0.774 0.950 394 520 0.577
dataset3 FCT fil1 UMAP 5 knn le 0.656 0.168 0.989 0.750 653 1099 0.566
dataset4 FCT fil1 UMAP 5 knn le 0.000 0.000 0.000 0.000 0 0 0.000
dataset5 FCT fil1 UMAP 5 knn le 0.796 0.635 0.883 0.950 531 819 0.737
dataset1 FCT fil1 PCA 20 knn le 0.716 0.510 0.843 0.243 1342 2173 0.724
dataset2 FCT fil1 PCA 20 knn le 0.173 1.000 0.075 0.752 1221 1367 0.433
dataset3 FCT fil1 PCA 20 knn le 0.476 0.640 0.426 0.393 2303 3019 0.504
dataset4 FCT fil1 PCA 20 knn le 0.704 0.843 0.595 0.446 707 1256 0.751
dataset5 FCT fil1 PCA 20 knn le 0.673 0.638 0.691 0.482 965 1462 0.719
dataset1 FCT fil1 UMAP 20 knn le 0.208 0.931 0.071 0.999 1077 1281 0.383
dataset2 FCT fil1 UMAP 20 knn le 0.191 1.000 0.061 0.999 410 476 0.263
dataset3 FCT fil1 UMAP 20 knn le 0.586 0.649 0.566 0.974 603 797 0.547
dataset4 FCT fil1 UMAP 20 knn le 0.000 0.000 0.000 0.000 0 0 0.000
dataset5 FCT fil1 UMAP 20 knn le 0.427 0.726 0.338 0.993 637 827 0.509
dataset1 FCT fil1 PCA 35 knn le 0.669 0.766 0.611 0.274 1025 1640 0.744
dataset2 FCT fil1 PCA 35 knn le 0.710 0.749 0.682 0.144 1453 2462 0.777
dataset3 FCT fil1 PCA 35 knn le 0.684 0.582 0.764 0.150 1239 2204 0.670
dataset4 FCT fil1 PCA 35 knn le 0.598 0.588 0.603 0.190 1502 2296 0.628
dataset5 FCT fil1 PCA 35 knn le 0.688 0.384 0.793 0.160 1140 1533 0.579
dataset1 FCT fil1 UMAP 35 knn le 0.726 0.212 0.797 0.931 814 927 0.271
dataset2 FCT fil1 UMAP 35 knn le 0.126 0.979 0.027 1.000 414 462 0.277
dataset3 FCT fil1 UMAP 35 knn le 0.904 0.887 0.957 1.000 23 94 0.940
dataset4 FCT fil1 UMAP 35 knn le 0.706 0.618 0.790 0.990 200 391 0.691
dataset5 FCT fil1 UMAP 35 knn le 0.681 0.460 0.734 0.962 831 1029 0.500
dataset1 FCT fil1 PCA 50 knn le 0.453 0.780 0.373 0.246 1414 1759 0.584
dataset2 FCT fil1 PCA 50 knn le 0.391 0.964 0.203 0.372 1264 1681 0.580
dataset3 FCT fil1 PCA 50 knn le 0.669 0.541 0.810 0.005 996 2099 0.697
dataset4 FCT fil1 PCA 50 knn le 0.875 0.011 0.996 -0.118 668 761 0.266
dataset5 FCT fil1 PCA 50 knn le 0.675 0.512 0.726 0.150 1618 2122 0.648
dataset1 FCT fil1 UMAP 50 knn le 0.385 0.728 0.317 0.996 467 559 0.462
dataset2 FCT fil1 UMAP 50 knn le 0.401 0.841 0.353 0.978 402 446 0.509
dataset3 FCT fil1 UMAP 50 knn le 0.946 0.667 0.971 0.998 34 37 0.750
dataset4 FCT fil1 UMAP 50 knn le 0.000 0.000 0.000 0.000 0 0 0.000
dataset5 FCT fil1 UMAP 50 knn le 0.267 0.894 0.133 0.996 745 905 0.324
dataset1 FCT fil1 PCA 65 knn le 0.762 0.332 0.835 0.046 2218 2589 0.518
dataset2 FCT fil1 PCA 65 knn le 0.715 0.694 0.740 0.130 712 1541 0.771
dataset3 FCT fil1 PCA 65 knn le 0.679 0.884 0.628 0.086 1590 1986 0.817
dataset4 FCT fil1 PCA 65 knn le 0.740 0.620 0.769 0.036 2173 2700 0.736
dataset5 FCT fil1 PCA 65 knn le 0.848 0.065 0.970 -0.016 1180 1364 0.475
dataset1 FCT fil1 UMAP 65 knn le 0.576 0.547 0.581 0.984 403 467 0.478
dataset2 FCT fil1 UMAP 65 knn le 0.401 0.904 0.287 0.994 363 446 0.469
dataset3 FCT fil1 UMAP 65 knn le 0.838 0.750 0.848 0.998 33 37 0.799
dataset4 FCT fil1 UMAP 65 knn le 0.550 0.695 0.513 0.986 378 473 0.572
dataset5 FCT fil1 UMAP 65 knn le 0.779 0.545 0.855 0.964 820 1086 0.665
dataset1 FCT fil1 PCA 80 knn le 0.636 0.497 0.680 0.084 1922 2530 0.596
dataset2 FCT fil1 PCA 80 knn le 0.280 0.921 0.113 0.324 1463 1844 0.473
dataset3 FCT fil1 PCA 80 knn le 0.679 0.827 0.672 0.086 1155 1207 0.753
dataset4 FCT fil1 PCA 80 knn le 0.626 0.628 0.625 0.085 1236 1639 0.661
dataset5 FCT fil1 PCA 80 knn le 0.765 0.250 0.820 0.048 1432 1584 0.456
dataset1 FCT fil1 UMAP 80 knn le 0.000 0.000 0.000 0.000 0 0 0.000
dataset2 FCT fil1 UMAP 80 knn le 0.000 0.000 0.000 0.000 0 0 0.000
dataset3 FCT fil1 UMAP 80 knn le 0.000 0.000 0.000 0.000 0 0 0.000
dataset4 FCT fil1 UMAP 80 knn le 0.752 0.311 0.949 0.931 369 533 0.658
dataset5 FCT fil1 UMAP 80 knn le 0.000 0.000 0.000 0.000 0 0 0.000
dataset1 FCT fil1 PCA 5 snn le 0.512 0.500 0.523 0.750 1215 2312 0.417
dataset2 FCT fil1 PCA 5 snn le 0.659 0.861 0.499 0.850 1388 2486 0.709
dataset3 FCT fil1 PCA 5 snn le 0.665 0.473 0.907 0.650 1173 2656 0.707
dataset4 FCT fil1 PCA 5 snn le 0.568 0.416 0.864 0.650 478 1408 0.648
dataset5 FCT fil1 PCA 5 snn le 0.644 0.655 0.641 0.750 1697 2196 0.650
dataset1 FCT fil1 UMAP 5 snn le 0.000 0.000 0.000 0.000 0 0 0.000
dataset2 FCT fil1 UMAP 5 snn le 0.691 0.438 0.772 0.950 600 792 0.592
dataset3 FCT fil1 UMAP 5 snn le 0.504 0.004 1.000 0.800 746 1485 0.405
dataset4 FCT fil1 UMAP 5 snn le 0.000 0.000 0.000 0.000 0 0 0.000
dataset5 FCT fil1 UMAP 5 snn le 0.548 0.485 0.628 0.950 293 668 0.550
dataset1 FCT fil1 PCA 20 snn le 0.671 0.838 0.535 0.296 1183 2150 0.671
dataset2 FCT fil1 PCA 20 snn le 0.494 0.543 0.488 0.380 2712 3060 0.497
dataset3 FCT fil1 PCA 20 snn le 0.675 0.370 0.826 0.126 2678 4001 0.592
dataset4 FCT fil1 PCA 20 snn le 0.406 0.817 0.315 0.464 2477 3024 0.568
dataset5 FCT fil1 PCA 20 snn le 0.686 0.552 0.745 0.273 891 1288 0.683
dataset1 FCT fil1 UMAP 20 snn le 0.831 0.523 0.904 0.835 1007 1244 0.639
dataset2 FCT fil1 UMAP 20 snn le 0.461 0.944 0.422 0.993 225 243 0.577
dataset3 FCT fil1 UMAP 20 snn le 0.550 0.748 0.516 0.981 674 789 0.517
dataset4 FCT fil1 UMAP 20 snn le 0.405 0.891 0.295 0.998 726 891 0.430
dataset5 FCT fil1 UMAP 20 snn le 0.113 0.986 0.017 0.999 1272 1412 0.266
dataset1 FCT fil1 PCA 35 snn le 0.517 0.851 0.354 0.362 1291 1922 0.570
dataset2 FCT fil1 PCA 35 snn le 0.652 0.534 0.672 0.240 1569 1835 0.606
dataset3 FCT fil1 PCA 35 snn le 0.945 0.098 0.994 -0.140 2484 2627 0.506
dataset4 FCT fil1 PCA 35 snn le 0.359 0.811 0.251 0.354 2622 3252 0.538
dataset5 FCT fil1 PCA 35 snn le 0.728 0.467 0.786 0.156 2327 2845 0.624
dataset1 FCT fil1 UMAP 35 snn le 0.149 1.000 0.020 1.000 988 1137 0.283
dataset2 FCT fil1 UMAP 35 snn le 0.880 0.114 0.927 0.871 572 607 0.364
dataset3 FCT fil1 UMAP 35 snn le 0.942 0.953 0.911 0.990 45 173 0.967
dataset4 FCT fil1 UMAP 35 snn le 0.000 0.000 0.000 0.000 0 0 0.000
dataset5 FCT fil1 UMAP 35 snn le 0.717 0.421 0.815 0.960 747 994 0.652
dataset1 FCT fil1 PCA 50 snn le 0.873 0.192 0.925 -0.030 2272 2444 0.494
dataset2 FCT fil1 PCA 50 snn le 0.321 0.894 0.177 0.384 1115 1397 0.433
dataset3 FCT fil1 PCA 50 snn le 0.779 0.461 0.908 0.012 1581 2227 0.650
dataset4 FCT fil1 PCA 50 snn le 0.654 0.489 0.682 0.112 2338 2739 0.598
dataset5 FCT fil1 PCA 50 snn le 0.750 0.328 0.899 0.074 1140 1543 0.618
dataset1 FCT fil1 UMAP 50 snn le 0.870 0.475 0.931 0.901 778 898 0.642
dataset2 FCT fil1 UMAP 50 snn le 0.786 1.000 0.785 0.960 251 252 0.791
dataset3 FCT fil1 UMAP 50 snn le 0.730 0.800 0.719 1.000 32 37 0.788
dataset4 FCT fil1 UMAP 50 snn le 0.254 0.892 0.182 0.998 325 362 0.446
dataset5 FCT fil1 UMAP 50 snn le 0.146 0.969 0.049 0.998 814 910 0.278
dataset1 FCT fil1 PCA 65 snn le 0.917 0.026 0.989 -0.194 2870 3100 0.320
dataset2 FCT fil1 PCA 65 snn le 0.733 0.021 0.984 -0.098 683 923 0.408
dataset3 FCT fil1 PCA 65 snn le 0.877 0.004 1.000 -0.226 1660 1895 0.203
dataset4 FCT fil1 PCA 65 snn le 0.669 0.455 0.710 0.060 2424 2892 0.609
dataset5 FCT fil1 PCA 65 snn le 0.748 0.635 0.805 0.186 840 1262 0.753
dataset1 FCT fil1 UMAP 65 snn le 0.000 0.000 0.000 0.000 0 0 0.000
dataset2 FCT fil1 UMAP 65 snn le 0.000 0.000 0.000 0.000 0 0 0.000
dataset3 FCT fil1 UMAP 65 snn le 0.351 1.000 0.294 0.998 34 37 0.451
dataset4 FCT fil1 UMAP 65 snn le 0.000 0.000 0.000 0.000 0 0 0.000
dataset5 FCT fil1 UMAP 65 snn le 0.000 0.000 0.000 0.000 0 0 0.000
dataset1 FCT fil1 PCA 80 snn le 0.234 0.978 0.039 0.422 1199 1513 0.389
dataset2 FCT fil1 PCA 80 snn le 0.792 0.154 0.898 -0.004 938 1094 0.471
dataset3 FCT fil1 PCA 80 snn le 0.731 0.521 0.858 -0.018 839 1351 0.720
dataset4 FCT fil1 PCA 80 snn le 0.730 0.464 0.803 0.062 1489 1901 0.673
dataset5 FCT fil1 PCA 80 snn le 0.685 0.568 0.721 0.132 1428 1875 0.673
dataset1 FCT fil1 UMAP 80 snn le 0.000 0.000 0.000 0.000 0 0 0.000
dataset2 FCT fil1 UMAP 80 snn le 0.918 0.154 0.960 0.946 705 744 0.446
dataset3 FCT fil1 UMAP 80 snn le 0.000 0.000 0.000 0.000 0 0 0.000
dataset4 FCT fil1 UMAP 80 snn le 0.000 0.000 0.000 0.000 0 0 0.000
dataset5 FCT fil1 UMAP 80 snn le 0.266 0.976 0.045 0.998 665 872 0.295

In Table 4, each row refers to a setting combination for running the co-clustering and the corresponding outcome (auc, accuracy, etc). The auc (area under the curve) and accuracy are computed according to correct tissue-cell assignments (Figure 1D). The total.assignment represents total tissue-cell assignments excluding cells without assignments while the true.assignment refers to total correct assignments. The outcomes are filtered according to auc \(\ge\) 0.7, accuracy \(\ge\) 0.7, total.assignment \(\ge\) 500, and true.assignment \(\ge\) 300.

df.res1.fil <- subset(df.res1, auc >= 0.7 & accuracy >= 0.7 & total.assignment >= 500 & true.assignment >= 300)
df.res1.fil[1:5, ]
##     dataset norm  fil dimred dims graph cluster accuracy
## 2  dataset2  FCT fil1    PCA    5   knn      wt    0.791
## 4  dataset4  FCT fil1    PCA    5   knn      wt    0.750
## 5  dataset5  FCT fil1    PCA    5   knn      wt    0.702
## 11 dataset1  FCT fil1    PCA   20   knn      wt    0.715
## 22 dataset2  FCT fil1    PCA   35   knn      wt    0.715
##    specificity sensitivity threshold true.assignment
## 2        0.825       0.782     0.750             688
## 4        0.703       0.793     0.550            1364
## 5        0.809       0.654     0.850            1209
## 11       0.646       0.806     0.267             697
## 22       0.842       0.630     0.118             860
##    total.assignment   auc
## 2               877 0.783
## 4              2643 0.743
## 5              1747 0.788
## 11             1636 0.783
## 22             1431 0.796

The remaining outcomes for each setting are summarized in Figure 2, 3, 4, and 5. More remaining outcomes indicate more robust settings, thus PCA, knn, and fg are chosen for dimension reduction (dimred), graph building (graph), and cluster detection (cluster) in the first phase respectively, while for topDimensions (dims), 5, 20, and 35 are chosen.

opt_bar(df.res=df.res1.fil, para.na='dimred', ylab='Remaining outcomes')
Remaining outcomes for dimension reduction methods.

Figure 2: Remaining outcomes for dimension reduction methods

opt_bar(df.res=df.res1.fil, para.na='dims', ylab='Remaining outcomes')
Remaining outcomes for the number of top dimensions that are used for co-clustering.

Figure 3: Remaining outcomes for the number of top dimensions that are used for co-clustering

opt_bar(df.res=df.res1.fil, para.na='graph', ylab='Remaining outcomes')
Remaining outcomes for graph-building methods.

Figure 4: Remaining outcomes for graph-building methods

opt_bar(df.res=df.res1.fil, para.na='cluster', ylab='Remaining outcomes')
Remaining outcomes for cluster detection methods.

Figure 5: Remaining outcomes for cluster detection methods

2.2.2 The second phase

Prior to the second phase, the same pre-processing steps as the first phase are used. The second phase focuses on refining the selected settings from the first phase. Since only one setting is selected for dimred, graph, and cluster in the first phase, the second phase narrows down the focus to topDimensions (dims). The fine-tuning process involves varying the dims over the interval from 5 to 35, with settings at intervals of 1 (i.e., 5, 6, …, 35). These setting combinations are organized in a data.frame (df.para2).

dimred <- c('PCA'); dims <- seq(5, 35, 1)  
graph <- c('knn'); cluster <- c('fg')  
df.para2 <- expand.grid(dataset=c('dataset1', 'dataset2', 'dataset3', 'dataset4', 'dataset5'), norm=norm, fil=fil, dimred=dimred, dims=dims, graph=graph, cluster=cluster, stringsAsFactors = FALSE) 
df.para2[1:5, ]
##    dataset norm  fil dimred dims graph cluster
## 1 dataset1  FCT fil1    PCA    5   knn      fg
## 2 dataset2  FCT fil1    PCA    5   knn      fg
## 3 dataset3  FCT fil1    PCA    5   knn      fg
## 4 dataset4  FCT fil1    PCA    5   knn      fg
## 5 dataset5  FCT fil1    PCA    5   knn      fg

The co-clustering workflow (Figure 1) is iteratively run with each combination in df.para2.

df.res2 <- coclus_opt(dat.lis, df.para1, df.fil.set, multi.core.par=MulticoreParam(workers=6, RNGseed=50), wk.dir='result/opt/tar')
df.res2[1:5, ] # Results.
##    dataset norm  fil dimred dims graph cluster accuracy
## 1 dataset1  FCT fil1    PCA    5   knn      fg    0.573
## 2 dataset2  FCT fil1    PCA    5   knn      fg    0.832
## 3 dataset3  FCT fil1    PCA    5   knn      fg    0.525
## 4 dataset4  FCT fil1    PCA    5   knn      fg    0.554
## 5 dataset5  FCT fil1    PCA    5   knn      fg    0.650
##   specificity sensitivity threshold true.assignment
## 1       0.367       0.811      0.25            2441
## 2       0.736       0.897      0.55            1993
## 3       0.634       0.470      0.55            5923
## 4       0.816       0.356      0.75            2612
## 5       0.701       0.601      0.85            2162
##   total.assignment   auc
## 1             5269 0.589
## 2             3339 0.848
## 3             8926 0.499
## 4             4580 0.565
## 5             4236 0.695

The results are filtered with the same criteria as the first phase: auc \(\ge\) 0.7, accuracy \(\ge\) 0.7, total.assignment \(\ge\) 500, and true.assignment \(\ge\) 300. Then the remaining outcomes for dims are shown in Table 5 and Figure 6. Since 14 has the most remaining outcomes, it is chosen as the optimal setting for topDimensions. The final optimal settings are presented in Table 6.

df.opt.final <- subset(df.res2, auc >= 0.7 & accuracy >= 0.7 & total.assignment >= 500 & true.assignment >= 300)
df.opt.final[1:5, ]
Table 5: Results in the second phase.
dataset norm fil dimred dims graph cluster accuracy specificity sensitivity threshold true.assignment total.assignment auc
dataset2 FCT fil1 PCA 5 knn fg 0.832 0.736 0.897 0.550 1993 3339 0.848
dataset2 FCT fil1 PCA 8 knn fg 0.711 0.605 0.836 0.345 2754 6027 0.755
dataset5 FCT fil1 PCA 8 knn fg 0.794 0.670 0.874 0.274 3160 5206 0.854
dataset5 FCT fil1 PCA 10 knn fg 0.776 0.366 0.942 0.109 3211 4511 0.707
dataset4 FCT fil1 PCA 11 knn fg 0.739 0.589 0.800 0.277 4171 5866 0.763
dataset5 FCT fil1 PCA 11 knn fg 0.700 0.654 0.725 0.286 2901 4490 0.719
dataset5 FCT fil1 PCA 12 knn fg 0.829 0.750 0.872 0.228 3306 5141 0.873
dataset1 FCT fil1 PCA 13 knn fg 0.840 0.604 0.953 0.233 3036 4481 0.747
dataset5 FCT fil1 PCA 13 knn fg 0.784 0.801 0.777 0.283 3554 5234 0.854
dataset1 FCT fil1 PCA 14 knn fg 0.769 0.650 0.843 0.194 3392 5503 0.804
dataset4 FCT fil1 PCA 14 knn fg 0.761 0.666 0.785 0.264 3664 4592 0.742
dataset5 FCT fil1 PCA 14 knn fg 0.733 0.684 0.743 0.281 3443 4108 0.724
dataset5 FCT fil1 PCA 15 knn fg 0.816 0.783 0.829 0.127 3710 5238 0.878
dataset2 FCT fil1 PCA 16 knn fg 0.747 0.608 0.815 0.128 3722 5535 0.786
dataset5 FCT fil1 PCA 16 knn fg 0.784 0.764 0.798 0.186 3081 5156 0.831
dataset1 FCT fil1 PCA 17 knn fg 0.744 0.556 0.864 0.226 2716 4435 0.781
dataset5 FCT fil1 PCA 19 knn fg 0.744 0.655 0.760 0.250 3656 4280 0.726
dataset1 FCT fil1 PCA 20 knn fg 0.726 0.689 0.800 0.238 1771 5213 0.810
dataset5 FCT fil1 PCA 20 knn fg 0.706 0.705 0.707 0.250 3805 5024 0.755
dataset1 FCT fil1 PCA 22 knn fg 0.794 0.643 0.845 0.130 4048 5422 0.791
dataset5 FCT fil1 PCA 22 knn fg 0.733 0.756 0.723 0.204 3523 5151 0.791
dataset4 FCT fil1 PCA 23 knn fg 0.735 0.646 0.788 0.136 2804 4475 0.786
dataset5 FCT fil1 PCA 23 knn fg 0.733 0.736 0.731 0.204 2576 4372 0.790
dataset2 FCT fil1 PCA 24 knn fg 0.739 0.738 0.741 0.236 316 1982 0.810
dataset5 FCT fil1 PCA 24 knn fg 0.828 0.752 0.866 0.110 3506 5233 0.857
dataset5 FCT fil1 PCA 25 knn fg 0.759 0.793 0.743 0.216 2128 3133 0.825
dataset5 FCT fil1 PCA 26 knn fg 0.738 0.651 0.759 0.150 4205 5225 0.747
dataset5 FCT fil1 PCA 27 knn fg 0.773 0.731 0.795 0.132 3300 5066 0.838
dataset4 FCT fil1 PCA 28 knn fg 0.749 0.716 0.767 0.078 3510 5422 0.778
dataset5 FCT fil1 PCA 28 knn fg 0.705 0.668 0.727 0.244 2425 3900 0.743
dataset4 FCT fil1 PCA 29 knn fg 0.707 0.560 0.777 0.085 3785 5593 0.707
dataset5 FCT fil1 PCA 29 knn fg 0.799 0.763 0.823 0.116 2564 4234 0.848
dataset5 FCT fil1 PCA 30 knn fg 0.705 0.698 0.709 0.178 2985 4477 0.750
dataset1 FCT fil1 PCA 31 knn fg 0.717 0.586 0.797 0.126 2765 4455 0.735
dataset5 FCT fil1 PCA 31 knn fg 0.772 0.737 0.790 0.152 2944 4364 0.813
dataset1 FCT fil1 PCA 32 knn fg 0.713 0.632 0.741 0.132 4018 5425 0.718
dataset5 FCT fil1 PCA 33 knn fg 0.789 0.717 0.836 0.074 2505 4149 0.824
dataset3 FCT fil1 PCA 34 knn fg 0.721 0.749 0.572 0.316 367 2282 0.709
dataset5 FCT fil1 PCA 34 knn fg 0.723 0.716 0.726 0.160 3660 5390 0.770
dataset5 FCT fil1 PCA 35 knn fg 0.733 0.613 0.765 0.236 1607 2049 0.711
opt_bar(df.res=df.opt.final, para.na='dims', ylab='Remaining outcomes', x.text.size = 23, y.text.size=23, axis.title.size=23)
Remaining outcomes for the number of top dimensions in the second phase.

Figure 6: Remaining outcomes for the number of top dimensions in the second phase

Table 6: Optimal settings
dimensionReduction topDimensions graphBuilding clusterDetection
PCA 14 knn fg

2.3 Testing optimal settings

Next, the optimal settings (Table 6) are tested on the five training datasets of Arabidopsis root and two other datasets of mouse kindney (Clark et al. 2019; Chen et al. 2017; Karaiskos et al. 2018; Park et al. 2018) and brain (Vacher et al. 2021; Ortiz et al. 2020), respectively. Details about how to format the latter two (Table 7) are described here. As the following procedures for running the co-clustering workflow are very similar with the previous optimization process, the explanations are reduced to minimum.

2.3.1 Testing on training data

To obtain reproducible results, a fixed seed is set for generating random numbers.

set.seed(50)

The following code demonstrates the testing of optimal settings on Arabidopsis single-cell data ‘sc.arab.rt10’ (Table 2). The process begins with joint normalization of bulk and single-cell data. Afterward, the bulk data are separated from the single-cell data and averaged across tissue replicates. Next, the single-cell data are filered as well.

# Joint normalization.
arab10.nor <- norm_cell(sce=sc.arab.rt10, bulk=blk.arab.rt, com=FALSE)
# Aggregate bulk replicates
arab10.aggr <- aggr_rep(data=arab10.nor$bulk, assay.na='logcounts', sam.factor='sample', aggr='mean')
# Filter bulk data.
arab10.blk.fil <- filter_data(data=arab10.aggr, pOA=c(0.1, 1), CV=c(0.1, 50), verbose=FALSE) 
# Filter cell data and subset bulk data to genes in cell data.
blk.sc.arab10.fil <- filter_cell(sce=arab10.nor$cell, bulk=arab10.blk.fil, cutoff=1, p.in.cell=0.15, p.in.gen=0.05, verbose=FALSE)

Next, the pre-processed bulk and single-cell data are co-clustered with the optimal settings. The ground-truth matching (df.match.arab) between bulk tissues and single cells is provided to compute the AUC value for the resulting tissue-cell assignments.

vld.arab10 <- cocluster(bulk=blk.sc.arab10.fil$bulk, cell=blk.sc.arab10.fil$cell, min.dim=14, dimred='PCA', graph.meth='knn', cluster='fg', df.match=df.match.arab)

The tissue-cell assignments obtained from the optimal settings are stored in the colData slot of the vld.arab10 object. The following columns provide essential details about the tissue-cell assignments:

  1. ‘cluster’: Represents cluster labels.
  2. ‘bulkCell’: Indicates whether each entry represents bulk tissues or single cells.
  3. ‘sample’: Represents the original labels of the bulk and cells.
  4. ‘assignedBulk’: Refers to the bulk tissues assigned to cells, with ‘none’ indicating unassigned cells.
  5. ‘similarity’: Represents Spearman’s correlation coefficients used for the tissue-cell assignments, serving as a measure of assignment stringency.
colData(vld.arab10$sce.all)[c(11:12, 16:17), c('cluster', 'bulkCell', 'sample', 'assignedBulk', 'similarity')]
## DataFrame with 4 rows and 5 columns
##          cluster    bulkCell      sample assignedBulk
##      <character> <character> <character>  <character>
## QC         clus7        bulk          QC         none
## HAIR       clus7        bulk        HAIR         none
## per        clus7        cell         per    PHLM_COMP
## phlo       clus7        cell        phlo    PHLM_COMP
##       similarity
##      <character>
## QC          none
## HAIR        none
## per        0.442
## phlo       0.736

The AUC value for the testing is stored in vld.arab10$roc.obj. Figure 7 shows the single-cell data with original cell labels (left) and cell labels (right) correctly assigned through the co-clustering with the optimal settings, where gray dots represent cells without tissue assignments or cells with incorrect assignments.

The optimal settings are tested on other training datasets (Table 2) with the same procedure, which is not shown. The corresponding AUC values are 0.6, 0.6, 0.8, and 0.8 respectively.

auc(vld.arab10$roc.obj)
## Area under the curve: 0.8083
dim_opt(input=vld.arab10, alp=1, tit1='Before co-clustering (arab10)', tit2='After co-clustering (arab10)', lgd.nrow=4, lgd.key.size=5, lgd.text.size=18, lgd.spa.y=0, lgd.spa.x=0, axis.font.size=0, pt.size=1, ann.size=8)
Results of testing optimal setting on Arabidopsis dataset. The left and right TSNE embedding plots present original cell labels and cell labels correctly assigned with the optimal settings, respectively.

Figure 7: Results of testing optimal setting on Arabidopsis dataset
The left and right TSNE embedding plots present original cell labels and cell labels correctly assigned with the optimal settings, respectively.

2.3.2 Testing on mouse data

The optimal settings are further tested on two additional datasets of mouse kidney and brain (Table 7) with the same procedure as the training datasets.

Table 7: Testing data of mouse brain and kidney.
Name DataType File
blk.mus.kdn bulk (mouse kidney) NCBI BioProject: PRJNA438336, PRJNA389326, PRJNA435940
sc.mus.kdn cell (mouse kidney) GEO: GSE107585
blk.mus.brain bulk (mouse brain) NCBI BioProject: PRJNA725533
sc.mus.brain cell (mouse brain) GEO: GSE147747

The following provided code demonstrates the testing on the mouse kidney data. To obtain reproducible results, a fixed seed is set for generating random numbers.

set.seed(50)

The ground-truth matching between bulk tissues and single cells for mouse kidney is imported.

# Ground-truth matching for kidney.
match.mus.kdn.pa <- system.file("extdata/cocluster", "true_match_mouse_kidney_cocluster.txt", package="spatialHeatmap")
df.match.mus.kdn <- read.table(match.mus.kdn.pa, header=TRUE, row.names=1, sep='\t')
df.match.mus.kdn[1:3, ]
##                 cell       trueBulk
## 1          proxi.tub PTS1,PTS2,PTS3
## 2     distal.con.tub            DCT
## 3 col.duct.prin.cell  CCD,OMCD,IMCD

The bulk and single-cell data of mouse kidney are pre-processed.

# Joint normalization.
kdn.nor <- norm_cell(sce=sc.mus.kdn, bulk=blk.mus.kdn, com=FALSE)
# Aggregate bulk replicates
kdn.aggr <- aggr_rep(data=kdn.nor$bulk, assay.na='logcounts', sam.factor='sample', aggr='mean')
# Filter bulk data.
kdn.blk.fil <- filter_data(data=kdn.aggr, pOA=c(0.1, 1), CV=c(0.1, 50), verbose=FALSE) 
# Filter cell data and subset bulk data to genes in cell data.
blk.sc.kdn.fil <- filter_cell(sce=kdn.nor$cell, bulk=kdn.blk.fil, cutoff=1, p.in.cell=0.15, p.in.gen=0.05, verbose=FALSE)

Next, the pre-processed bulk and single-cell data are co-clustered with the optimal settings. The ground-truth matching (df.match.mus.kdn) between bulk tissues and single cells is provided to compute the AUC value for the resulting tissue-cell assignments.

vld.kdn <- cocluster(bulk=blk.sc.kdn.fil$bulk, cell=blk.sc.kdn.fil$cell, min.dim=14, dimred='PCA', graph.meth='knn', cluster='fg', df.match=df.match.mus.kdn)
colData(vld.kdn$sce.all)[1:4, c('cluster', 'bulkCell', 'sample', 'assignedBulk', 'similarity')]
## DataFrame with 4 rows and 5 columns
##                    cluster    bulkCell         sample assignedBulk  similarity
##                <character> <character>    <character>  <character> <character>
## distal.con.tub       clus5        cell distal.con.tub         GLOM       0.108
## proxi.tub            clus3        cell      proxi.tub         none        none
## proxi.tub            clus4        cell      proxi.tub         PTS2      -0.002
## endo                 clus5        cell           endo         GLOM       0.297

The AUC value for the testing is stored in vld.kdn$roc.obj. Figure 8 shows the single-cell data with original cell labels (left) and cell labels (right) correctly assigned through the co-clustering with the optimal settings respectively, where gray dots represent cells without tissue assignments or cells with incorrect assignments.

auc(vld.kdn$roc.obj)
## Area under the curve: 0.6365
dim_opt(input=vld.kdn, alp=1, tit1='Before co-clustering (mouse kidney)', tit2='After co-clustering (mouse kidney)', lgd.nrow=4, lgd.key.size=5, lgd.text.size=18, lgd.spa.y=0, lgd.spa.x=0, axis.font.size=0, pt.size=1, ann.size=8)
Results of testing optimal setting on mouse kidney. The left and right TSNE embedding plots present single-cell data with original cell labels and cell labels correctly assigned with the optimal settings, respectively.

Figure 8: Results of testing optimal setting on mouse kidney
The left and right TSNE embedding plots present single-cell data with original cell labels and cell labels correctly assigned with the optimal settings, respectively.

The following code demenstrates the testing on the mouse brain data. To obtain reproducible results, a fixed seed is set for generating random numbers.

set.seed(50)

The ground-truth matching between bulk tissues and single cells for mouse kidney is imported.

# Ground-truth matching for brain.
match.mus.brain.pa <- system.file("extdata/cocluster", "true_match_mouse_brain_cocluster.txt", package="spatialHeatmap")
df.match.mus.brain <- read.table(match.mus.brain.pa, header=TRUE, row.names=1, sep='\t')
df.match.mus.brain
##        cell    trueBulk
## 1      cere        CERE
## 2      hipp        HIPP
## 3   isocort CERE.CORTEX
## 4 retrohipp        HIPP
## 5   hypotha     HYPOTHA

The bulk and single-cell data of mouse brain are pre-processed.

# Joint normalization.
brain.nor <- norm_cell(sce=sc.mus.brain, bulk=blk.mus.brain, com=FALSE)
# Aggregate bulk replicates
brain.aggr <- aggr_rep(data=brain.nor$bulk, assay.na='logcounts', sam.factor='sample', aggr='mean')
# Filter bulk data.
brain.blk.fil <- filter_data(data=brain.aggr, pOA=c(0.1, 1), CV=c(0.1, 50), verbose=FALSE) 
# Filter cell data and subset bulk data to genes in cell data.
blk.sc.brain.fil <- filter_cell(sce=brain.nor$cell, bulk=brain.blk.fil, cutoff=1, p.in.cell=0.15, p.in.gen=0.05, verbose=FALSE)

Next, the pre-processed bulk and single-cell data are co-clustered with the optimal settings. The ground-truth matching (df.match.mus.brain) between bulk tissues and single cells is provided to compute the AUC value for the resulting tissue-cell assignments.

vld.brain <- cocluster(bulk=blk.sc.brain.fil$bulk, cell=blk.sc.brain.fil$cell, min.dim=14, dimred='PCA', graph.meth='knn', cluster='fg', df.match=df.match.mus.brain)
colData(vld.brain$sce.all)[1:4, c('cluster', 'bulkCell', 'sample', 'assignedBulk', 'similarity')]
## DataFrame with 4 rows and 5 columns
##             cluster    bulkCell      sample assignedBulk
##         <character> <character> <character>  <character>
## isocort       clus6        cell     isocort  CERE.CORTEX
## isocort       clus6        cell     isocort  CERE.CORTEX
## isocort       clus6        cell     isocort  CERE.CORTEX
## isocort       clus6        cell     isocort  CERE.CORTEX
##          similarity
##         <character>
## isocort       0.433
## isocort       0.591
## isocort       0.578
## isocort       0.684

The AUC value for the testing is stored in vld.brain$roc.obj. Figure 9 shows the single-cell data with original cell labels (left) and cell labels (right) correctly assigned through the co-clustering with the optimal settings respectively, where gray dots represent cells without tissue assignments or cells with incorrect assignments.

auc(vld.brain$roc.obj)
## Area under the curve: 0.8808
dim_opt(input=vld.brain, alp=1, tit1='Before co-clustering (mouse brain)', tit2='After co-clustering (mouse brain)', lgd.nrow=4, lgd.key.size=5, lgd.text.size=18, lgd.spa.y=0, lgd.spa.x=0, axis.font.size=0, pt.size=1, ann.size=8)
Results of testing optimal setting on mouse brain. The left and right TSNE embedding plots present single-cell data with original cell labels and cell labels correctly assigned with the optimal settings, respectively.

Figure 9: Results of testing optimal setting on mouse brain
The left and right TSNE embedding plots present single-cell data with original cell labels and cell labels correctly assigned with the optimal settings, respectively.

2.4 Discussion and conclusion

All AUC values obtained from the above testing are greater than 0.6, indicating that the optimal settings (Table 6) performed better than random classification (AUC=0.5) in distinguishing positive and negative assignments. It is important to note that the purpose of co-clustering is to assign source tissues to the most representative cells, and the assumption is that optimal settings should be similar across species. This assumption is supported by the AUC ranges observed in the Arabidopsis (0.6-0.8) and mouse (0.6-0.9) datasets. However, it should be acknowledged that the testing of the optimal settings is limited to mouse datasets from only two organs, so their performance on other species cannot be guaranteed.


3 Version Informaion

sessionInfo()
## R version 4.3.0 (2023-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/Los_Angeles
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] pROC_1.18.4                 BiocSingular_1.16.0         htmltools_0.5.5             sortable_0.5.0              HDF5Array_1.28.1            rhdf5_2.44.0                DelayedArray_0.26.6        
##  [8] S4Arrays_1.0.4              Matrix_1.5-1                av_0.8.3                    animation_2.7               reshape2_1.4.4              flashClust_1.01-2           genefilter_1.82.1          
## [15] data.table_1.14.8           ggdendro_0.1.23             gridExtra_2.3               rsvg_2.4.0                  grImport_0.9-7              XML_3.99-0.14               magick_2.7.4               
## [22] visNetwork_2.1.2            plotly_4.10.2               yaml_2.3.7                  shinyjs_2.1.0               shinyBS_0.61.1              shinyWidgets_0.7.6          DT_0.28                    
## [29] shinydashboardPlus_2.0.3    shinydashboard_0.7.2        shiny_1.7.4.1               UpSetR_1.4.0                gplots_3.1.3                edgeR_3.42.4                dplyr_1.1.2                
## [36] biomaRt_2.56.1              GEOquery_2.68.0             ExpressionAtlas_1.28.0      jsonlite_1.8.7              RCurl_1.98-1.12             xml2_1.3.5                  limma_3.56.2               
## [43] kableExtra_1.3.4            BiocParallel_1.34.2         igraph_1.5.0                scater_1.28.0               ggplot2_3.4.2               scran_1.28.1                scuttle_1.10.1             
## [50] SingleCellExperiment_1.22.0 SummarizedExperiment_1.30.2 Biobase_2.60.0              GenomicRanges_1.52.0        GenomeInfoDb_1.36.1         IRanges_2.34.1              S4Vectors_0.38.1           
## [57] BiocGenerics_0.46.0         MatrixGenerics_1.12.2       matrixStats_1.0.0           spatialHeatmap_2.7.4        knitr_1.43                  BiocStyle_2.28.0            nvimcom_0.9-128            
## [64] colorout_1.2-2             
## 
## loaded via a namespace (and not attached):
##   [1] splines_4.3.0             later_1.3.1               filelock_1.0.2            bitops_1.0-7              ggplotify_0.1.1           tibble_3.2.1              lifecycle_1.0.3           rprojroot_2.0.3          
##   [9] lattice_0.21-8            MASS_7.3-59               magrittr_2.0.3            sass_0.4.6                rmarkdown_2.23            jquerylib_0.1.4           metapod_1.8.0             httpuv_1.6.11            
##  [17] DBI_1.1.3                 zlibbioc_1.46.0           rvest_1.0.3               purrr_1.0.1               yulab.utils_0.0.6         rappdirs_0.3.3            GenomeInfoDbData_1.2.10   ggrepel_0.9.3            
##  [25] irlba_2.3.5.1             annotate_1.78.0           dqrng_0.3.0               svglite_2.1.1             DelayedMatrixStats_1.22.1 codetools_0.2-19          tidyselect_1.2.0          farver_2.1.1             
##  [33] ScaledMatrix_1.8.1        viridis_0.6.3             BiocFileCache_2.8.0       webshot_0.5.5             BiocNeighbors_1.18.0      ellipsis_0.3.2            survival_3.5-5            systemfonts_1.0.4        
##  [41] tools_4.3.0               progress_1.2.2            Rcpp_1.0.11               glue_1.6.2                xfun_0.39                 withr_2.5.0               BiocManager_1.30.21       fastmap_1.1.1            
##  [49] rhdf5filters_1.12.1       bluster_1.10.0            fansi_1.0.4               caTools_1.18.2            digest_0.6.33             rsvd_1.0.5                shinytoastr_2.1.1         R6_2.5.1                 
##  [57] mime_0.12                 gridGraphics_0.5-1        colorspace_2.1-0          gtools_3.9.4              RSQLite_2.3.1             utf8_1.2.3                tidyr_1.3.0               generics_0.1.3           
##  [65] htmlwidgets_1.6.2         prettyunits_1.1.1         httr_1.4.6                pkgconfig_2.0.3           gtable_0.3.3              blob_1.2.4                XVector_0.40.0            bookdown_0.34            
##  [73] scales_1.2.1              png_0.1-8                 rstudioapi_0.15.0         tzdb_0.4.0                curl_5.0.1                shinyAce_0.4.2            cachem_1.0.8              stringr_1.5.0            
##  [81] spsComps_0.3.3.0          KernSmooth_2.23-20        parallel_4.3.0            vipor_0.4.5               AnnotationDbi_1.62.2      pillar_1.9.0              vctrs_0.6.3               promises_1.2.0.1         
##  [89] dbplyr_2.3.3              beachmat_2.16.0           xtable_1.8-4              cluster_2.1.4             beeswarm_0.4.0            evaluate_0.21             readr_2.1.4               cli_3.6.1                
##  [97] locfit_1.5-9.8            compiler_4.3.0            rlang_1.1.1               crayon_1.5.2              labeling_0.4.2            plyr_1.8.8                ggbeeswarm_0.7.2          stringi_1.7.12           
## [105] viridisLite_0.4.2         assertthat_0.2.1          munsell_0.5.0             Biostrings_2.68.1         lazyeval_0.2.2            hms_1.1.3                 sparseMatrixStats_1.12.2  bit64_4.0.5              
## [113] learnr_0.11.4             Rhdf5lib_1.22.0           KEGGREST_1.40.0           statmod_1.5.0             highr_0.10                memoise_2.0.1             bslib_0.5.0               bit_4.0.5

4 Funding

This project has been funded by NSF awards: PGRP-1546879, PGRP-1810468, PGRP-1936492.

References

Chen, Lihe, Jae Wook Lee, Chung-Lin Chou, Anil V Nair, Maria A Battistone, Teodor G Păunescu, Maria Merkulova, et al. 2017. “Transcriptomes of Major Renal Collecting Duct Cell Types in Mouse Identified by Single-Cell RNA-seq.” Proc. Natl. Acad. Sci. U. S. A. 114 (46): E9989–98.
Clark, Jevin Z, Lihe Chen, Chung-Lin Chou, Hyun Jun Jung, Jae Wook Lee, and Mark A Knepper. 2019. “Representation and Relative Abundance of Cell-Type Selective Markers in Whole-Kidney RNA-Seq Data.” Kidney Int. 95 (4): 787–96.
Csardi, Gabor, and Tamas Nepusz. 2006. “The Igraph Software Package for Complex Network Research.” InterJournal Complex Systems: 1695. http://igraph.org.
Karaiskos, Nikos, Mahdieh Rahmatollahi, Anastasiya Boltengagen, Haiyue Liu, Martin Hoehne, Markus Rinschen, Bernhard Schermer, et al. 2018. “A Single-Cell Transcriptome Atlas of the Mouse Glomerulus.” J. Am. Soc. Nephrol. 29 (8): 2060–68.
Li, Song, Masashi Yamada, Xinwei Han, Uwe Ohler, and Philip N Benfey. 2016. High-Resolution Expression Map of the Arabidopsis Root Reveals Alternative Splicing and lincRNA Regulation.” Dev. Cell 39 (4): 508–22.
Lun, Aaron T. L., Davis J. McCarthy, and John C. Marioni. 2016. “A Step-by-Step Workflow for Low-Level Analysis of Single-Cell RNA-Seq Data with Bioconductor.” F1000Res. 5: 2122. https://doi.org/10.12688/f1000research.9501.2.
Morgan, Martin, Jiefei Wang, Valerie Obenchain, Michel Lang, Ryan Thompson, and Nitesh Turaga. 2021. BiocParallel: Bioconductor Facilities for Parallel Evaluation. https://github.com/Bioconductor/BiocParallel.
Ortiz, Cantin, Jose Fernandez Navarro, Aleksandra Jurek, Antje Märtin, Joakim Lundeberg, and Konstantinos Meletis. 2020. “Molecular Atlas of the Adult Mouse Brain.” Science Advances 6 (26): eabb3446.
Park, Jihwan, Rojesh Shrestha, Chengxiang Qiu, Ayano Kondo, Shizheng Huang, Max Werth, Mingyao Li, Jonathan Barasch, and Katalin Suszták. 2018. “Single-Cell Transcriptomics of the Mouse Kidney Reveals Potential Cellular Targets of Kidney Disease.” Science 360 (6390): 758–63.
Robin, Xavier, Natacha Turck, Alexandre Hainard, Natalia Tiberti, Frédérique Lisacek, Jean-Charles San chez, and Markus Müller. 2011. “pROC: An Open-Source Package for r and s+ to Analyze and Compare ROC Curves.” BMC Bioinformatics 12: 77.
Shahan, Rachel, Che-Wei Hsu, Trevor M Nolan, Benjamin J Cole, Isaiah W Taylor, Anna Hendrika Cornelia Vlot, Philip N Benfey, and Uwe Ohler. 2020. “A Single Cell Arabidopsis Root Atlas Reveals Developmental Trajectories in Wild Type and Cell Identity Mutants.” bioRxiv.
Vacher, Claire-Marie, Helene Lacaille, Jiaqi J O’Reilly, Jacquelyn Salzbank, Dana Bakalar, Sonia Sebaoui, Philippe Liere, et al. 2021. “Placental Endocrine Function Shapes Cerebellar Development and Social Behavior.” Nat. Neurosci. 24 (10): 1392–1401.