This vignette serves to replicate the examples presented in the spatialHeatmap manuscript. As the pertinent functionalities have been introduced in the spatialHeatmap package's vignettes, this document offers concise explanations of each relevant functionality. Some code chunks are not evaluated when buiding this vignette due to long computation time.
To build this vignette, download the folder containing the source code, then run rmarkdown::render('manuscript_examples.Rmd')
in the downloaded folder.
The required packages and helper functions are loaded. To reduce runtime, intermediate results can be cached under ~/.cache/shm
.
# R: 4.2.0; Bioconductor: 3.15
library(spatialHeatmap); library(SummarizedExperiment); library(ExpressionAtlas); library(ggplot2)
library(biomaRt) # v2.54.0
library(enrichplot); library(clusterProfiler) # v4.6.0
source('function/fun.R') # Helper functions.
cache.pa <- '~/.cache/shm'
The Spatial Enrichment (SpEn) functionality is designed to identify groups of biomolecules (e.g. RNAs, proteins, metabolites) that are particularly abundant or enriched in certain spatial regions, such as tissue-specific transcripts. The chosen mouse RNA-Seq data compares tissue level gene expression across mammalian species (Merkin et al. 2012). The following searches the Expression Atlas for expression data from 'kidney' and 'Mus musculus'.
all.mus <- read_cache(cache.pa, 'all.mus') # Retrieve data from cache.
if (is.null(all.mus)) { # Save downloaded data to cache if it is not cached.
all.mus <- searchAtlasExperiments(properties="kidney", species="Mus musculus")
save_cache(dir=cache.pa, overwrite=TRUE, all.mus)
}
Among the many matching entries, accession 'E-MTAB-2801' is downloaded.
all.mus[7, ]
## DataFrame with 1 row and 4 columns
## Accession Species Type
## <character> <character> <character>
## 1 E-MTAB-2801 Mus musculus RNA-seq of coding RNA
## Title
## <character>
## 1 Strand-specific RNA-..
rse.mus <- read_cache(cache.pa, 'rse.mus') # Read data from cache.
if (is.null(rse.mus)) { # Save downloaded data to cache if it is not cached.
rse.mus <- getAtlasData('E-MTAB-2801')[[1]][[1]]
save_cache(dir=cache.pa, overwrite=TRUE, rse.mus)
}
colData(rse.mus)[1:3, ]
## DataFrame with 3 rows and 4 columns
## AtlasAssayGroup organism organism_part strain
## <character> <character> <character> <character>
## SRR594393 g7 Mus musculus brain DBA/2J
## SRR594394 g21 Mus musculus colon DBA/2J
## SRR594395 g13 Mus musculus heart DBA/2J
A targets file is loaded to the colData
slot of rse.mus
.
mus.tar <- system.file('extdata/shinyApp/data/target_mouse.txt', package='spatialHeatmap')
target.mus <- read.table(mus.tar, header=TRUE, row.names=1, sep='\t') # Importing
colData(rse.mus) <- DataFrame(target.mus) # Loading
target.mus[1:3, ]
## AtlasAssayGroup organism organism_part strain
## SRR594393 g7 Mus musculus brain DBA.2J
## SRR594394 g21 Mus musculus colon DBA.2J
## SRR594395 g13 Mus musculus heart DBA.2J
Five spatial features ('brain', 'liver', 'lung', 'colon', and 'kidney') and three experiment variables (mouse strains 'DBA.2J', 'C57BL.6', and 'CD1') are subsetted. The com.by
argument specifies whether the enrichment will be performed on spatial features (ft
) or experiment variables (var
). In the following, com.by
is set ft
, so SpEn is performed for spatial features and the variables under each spatial feature are treated as replicates.
data.sub.mus <- sf_var(data=rse.mus, feature='organism_part', ft.sel=c('brain', 'lung', 'colon', 'kidney', 'liver'), variable='strain', var.sel=c('DBA.2J', 'C57BL.6', 'CD1'), com.by='ft')
The subsetted data is filtered to remove unreliable expression values.
data.sub.mus.fil <- filter_data(data=data.sub.mus, pOA=c(0.5, 15), CV=c(0.8, 100), verbose=FALSE)
The Ensembl gene ids are converted to UniProt symbols.
data.sub.mus.fil <- cvt_mart(input=data.sub.mus.fil, biomart='ensembl', dataset="mmusculus_gene_ensembl", mirror='www', filters='ensembl_gene_id', attributes=c("ensembl_gene_id", "uniprot_gn_symbol", "entrezgene_id", "go_id", "description"), target='uniprot_gn_symbol', use.cache=TRUE, cache.na='ann.mus', cache.dir='~/.cache/shm')
ann.mus <- rowData(data.sub.mus.fil)
SpEn is performed for each spatial feature with 'edgeR'. The count data are internally normalized by the 'TMM' method from 'edgeR'. The enrichment results are selected by log2 fold change (log2.fc
) and FDR (fdr
). The outliers
argument specifies a number of outliers allowed in references.
enr.res <- spatial_enrich(data.sub.mus.fil, method=c('edgeR'), norm='TMM', log2.fc=1, fdr=0.05, outliers=1)
The overlap of enriched biomolecules (type='up'
) across spatial features are presented in an UpSet plot (plot='upset'
).
ovl_enrich(enr.res, type='up', plot='upset', font.size=5, venn.arg=list(), order.by='freq')
The enriched and depleted genes in liver are queried from the enrichment results. In the 'type' column, 'up' and 'down' refer to 'enriched' and 'depleted' respectively, while the 'total' column shows the total reference features excluding outliers.
en.liver <- query_enrich(enr.res, 'liver')
## Done!
up.liver <- subset(rowData(en.liver), type=='up' & total==4)
up.liver[1:3, 1:3]
## DataFrame with 3 rows and 3 columns
## type total method
## <character> <numeric> <character>
## F10 up 4 edgeR
## Itih1 up 4 edgeR
## Alb up 4 edgeR
dn.liver <- subset(rowData(en.liver), type=='down' & total==4)
dn.liver[1:3, 1:3]
## DataFrame with 3 rows and 3 columns
## type total method
## <character> <numeric> <character>
## Mgat3 down 4 edgeR
## Osbpl6 down 4 edgeR
## Slc1a1 down 4 edgeR
The mouse brain aSVG is imported.
svg.mus.pa <- system.file("extdata/shinyApp/data", "mus_musculus.male.svg", package="spatialHeatmap")
svg.mus <- read_svg(svg.mus.pa)
cord <- svg.mus[, 2][[1]]; cord[1, 4] <- 0
attribute(svg.mus)[[1]] <- cord
One enriched (Itih1) and one depleted (Mgat3) gene in mouse liver are chosen to plot enrichment spatial heatmaps (SHMs).
# Aggregate replicates across spatial features by mean.
en.liver <- aggr_rep(en.liver, sam.factor='com.by', con.factor=NULL, aggr='mean')
dat.enrich <- SPHM(svg=svg.mus, bulk=en.liver)
shm(data=dat.enrich, ID=c('Itih1', 'Mgat3'), legend.r=1, legend.nrow=3, sub.title.size=12, ncol=2, bar.width=0.09, lay.shm='con')
## ggplots/grobs: mus_musculus.male.svg ...
## ggplot: Itih1, con
## ggplot: Mgat3, con
## Legend plot ...
## CPU cores: 1
## Converting "ggplot" to "grob" ...
## Itih1_con_1 Mgat3_con_1
## Converting "ggplot" to "grob" ...
##
## Plot layout ...
## Plot layout ...
## Done!
SHMs are suitable for comparing assay profiles among small number of biomolecules (e.g. genes) across spatial features. To also support analysis routines of larger number of biomolecules, spatialHeatmap integrates functionalities for identifying groups of biomolecules with similar and/or dissimilar assay profiles, and subsequently analyzing the results with data mining methods that scale to larger numbers of biomolecules than SHMs, such as hierarchical clustering and network analysis methods.
The replicates of organs in the assay data are aggregated by mean. The query gene used in this example is 'Ugp2' that is spatially enriched in mouse liver. After subsetting the aggregated assay matrix to the 15% most similar members from a correlation-based search (including the query), hierarchical clustering is performed. The clustering results are presented in the form of a matrix heatmap (Figure 3). In Figure 3, the rows and columns are sorted by the dendrograms obtained from the hierarchical clustering step, and the query gene is indicated by a black line.
# Aggregate replicates.
mus.nor.aggr <- aggr_rep(enr.res$data, sam.factor='organism_part', con.factor=NULL, aggr='mean')
# Subset assay data.
sub.mat <- submatrix(data=mus.nor.aggr, ID='Ugp2', p=0.15)
dim(sub.mat)
## [1] 1376 5
mhm.res <- matrix_hm(ID='Ugp2', data=sub.mat, scale='row', angleCol=80, angleRow=35, cexRow=1, cexCol=1.2, margin=c(10, 6), static=TRUE, arg.lis1=list(offsetRow=0.01, offsetCol=0.01))
The row dendrograms are cut at the height h=3.685629
to obtain the cluster containing the query gene.
# Cut row dendrograms to obtain the cluster.
cl <- cut_dendro(dendro=mhm.res$rowDendrogram, h=3.685629, target='Ugp2'); cl
## [1] "Abcd3" "Acat1" "Aco1" "Aldh3a2" "Cers2" "Chchd10"
## [7] "Cluh" "Dbi" "Dcaf11" "Eif4ebp2" "Entpd5" "Gstm1"
## [13] "Herpud1" "Hsd17b4" "Idh1" "Lonp2" "Mia2" "Prdx5"
## [19] "Rxra" "Sdc4" "Slc25a39" "Sod1" "Ugp2"
# Entrez gene ids of the cluster.
df.entr <- subset(ann.mus, uniprot_gn_symbol %in% cl)
entriz <- unique(df.entr$entrezgene_id); entriz
## [1] 216558 11671 76893 20181 20971 68066 13688 74148 338320
## [10] 12499 28199 20655 15488 54683 15926 13167 19299 11428
## [19] 64209 110446 66887 103172 14862
# Export the cluster containing the query gene.
# rownames(df.entr) <- seq_len(nrow(df.entr))
# write.table(df.entr[, c('ensembl_gene_id', 'uniprot_gn_symbol', 'entrezgene_id')], 'cluster_hc.xlsx', col.names=T, row.names=T, sep='\t')
KEGG enrichment is performed on the cluster containing the query. The Peroxisome pathway (mmu04146) is the most highly enriched category (Figure 4). Peroxisomes are involved in many important metabolic processes including lipid metabolism in hepatocytes (Knebel et al. 2015). This is in alignment with the fact that the query gene 'Ugp2' is spatially enriched in liver.
kk <- enrichKEGG(gene = entriz, keyType='kegg', organism = 'mmu', pvalueCutoff = 0.05, pAdjustMethod='BH', qvalueCutoff=0.05)
## Reading KEGG annotation online: "https://rest.kegg.jp/link/mmu/pathway"...
## Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/mmu"...
df.kk <- as.data.frame(kk)
barplot(kk, showCategory=5)
# browseKEGG(kk, 'mmu04146')
The objective of network analysis is to identify network clusters that can be visualized in the form of network graphs. The network clusters represent groups of genes sharing highly similar expression profiles. Briefly, to run this analysis, a global network is constructed for the genes stored in the assay matrix that is subsequently partitioned into network clusters.
The same input assay matrix used in the hierarchical clustering example are used in the network analysis. The spatially-enriched gene 'Serpina1b' in the liver is chosen as the query for this analysis. After subsetting the assay matrix to the 15% most similar members from a correlation-based search (including the query), network analysis is performed. This analysis produces results in the form of an adjacency matrix and cluster assignments. The adjacency matrix contains adjacency values (expression similarities) for each pair of genes, while the cluster assignments are provided at four different levels of stringency (0, 1, 2, and 3). Higher stringency levels result in more clusters with smaller sizes. Within each version of the cluster assignment, a value of '0' indicates that a gene is not assigned to any cluster.
# Subset the assay data.
sub.mat <- submatrix(data=mus.nor.aggr, ID='Serpina1b', p=0.15)
dim(sub.mat)
## [1] 1376 5
# Identify network clusters.
adj.mod <- adj_mod(data=sub.mat)
## ..connectivity..
## ..matrix multiplication (system BLAS)..
## ..normalization..
## ..done.
## ..done.
## ..done.
## ..done.
## ..done.
# Adjacency matrix.
adj.mod[['adj']][1:3, 1:3]
## Slc22a18 ENSMUSG00000108207 ENSMUSG00000115867
## Slc22a18 1.0000000 0.5358801 0.1034367
## ENSMUSG00000108207 0.5358801 1.0000000 0.1974021
## ENSMUSG00000115867 0.1034367 0.1974021 1.0000000
# Four versions of network clusters.
adj.mod[['mod']][1:3, ]
## 0 1 2 3
## Slc22a18 12 9 36 24
## ENSMUSG00000108207 27 26 20 7
## ENSMUSG00000115867 0 0 0 0
The cluster containing the query gene is presented in a network graph (Figure 5). Nodes and edges in the graph represent genes and adjacency values between genes, respectively. The node size is proportional to the connectivity which is the sum of a gene's adjacency values with all other genes in the same cluster.
net <- network(ID='Serpina1b', data=sub.mat, adj.mod=adj.mod, ds=3, adj.min=0, vertex.label.cex=1, vertex.cex=5, static=TRUE)
Entrez gene ids are obtained for the cluster.
# Cluster containing the query.
cl <- rownames(net); cl
## [1] "Arhgef37" "Slc25a47" "Pfkfb1"
## [4] "ENSMUSG00000092021" "Atf5" "Gck"
## [7] "Klkb1" "Serpina1b" "Vtn"
df.entr.net <- subset(ann.mus, uniprot_gn_symbol %in% cl)
# Export the cluster.
# rownames(df.entr.net) <- seq_len(nrow(df.entr.net))
# write.table(df.entr.net[, c('ensembl_gene_id', 'uniprot_gn_symbol', 'entrezgene_id')], 'module.xlsx', col.names=T, row.names=T, sep='\t')
# Entrez gene ids of the cluster.
entriz <- unique(df.entr.net$entrezgene_id); entriz
## [1] 22370 18639 107503 103988 328967 104910 20701 16621
KEGG enrichment is performed on the cluster containing the query gene (Figure 6). The Complement and Coagulation Cascades pathway (mmu04610) is most significantly enriched in this cluster. This finding aligns with the fact that the query gene 'Serpina1b' is enriched in liver, as the liver is known to express a significant portion of complement proteins (Thorgersen et al. 2019).
kk <- enrichKEGG(gene = entriz, keyType='kegg', organism = 'mmu', pvalueCutoff = 0.05, pAdjustMethod='BH', qvalueCutoff=0.05)
df.kk <- as.data.frame(kk)
barplot(kk, showCategory=5)
# browseKEGG(kk, 'mmu04146')
spatialHeatmap offers functionalities for analyzing spatial assay data with SHMs, and optionally extending them with large-scale data mining routines. The following use case example demonstrates the advantages of these utilities within a discovery workflow, and the input bioassay data are the same as the SpEn example (Merkin et al. 2012). A typical analysis starts with a biomolecule or feature of interest. In this example, the brain is used as the query spatial feature, and the same SpEn process is performed as the SpEn example. To prevent redundancy, the enrichment results (enr.res
) from the SpEn example are used directly.
The enrichment results are queried by brain. Among the enriched genes in brain, 'Grik3' is chosen as the query gene. This gene is interesting since it codes for a well-known excitatory neurotransmitter receptor that is involved in the formation of various neurodegenerative diseases and cancers (Stepulak et al. 2009; Xiao et al. 2019).
en.brain <- query_enrich(enr.res, 'brain')
## Done!
up.brain <- subset(rowData(en.brain), type=='up' & total==4)
up.brain['Grik3', 1:3]
## DataFrame with 1 row and 3 columns
## type total method
## <character> <numeric> <character>
## Grik3 up 4 edgeR
After subsetting the assay matrix to the 15% most similar members from a correlation-based search (including 'Grik3'), hierarchical clustering is performed. The clustering results are presented in the form of a matrix heatmap (Figure 3). In Figure 3, the rows and columns are sorted by the dendrograms obtained from the hierarchical clustering step, and the query gene is indicated by a black line.
sub.mat <- submatrix(data=mus.nor.aggr, ID='Grik3', p=0.15)
dim(sub.mat)
## [1] 1376 5
mhm.res <- matrix_hm(ID='Grik3', data=sub.mat, scale='row', angleCol=80, angleRow=35, cexRow=0.8, cexCol=1.2, margin=c(10, 6), static=TRUE, arg.lis1=list(offsetRow=0.01, offsetCol=0.01))
The row dendrograms are cut at the height h=7.149385
to obtain the cluster containing the query gene ('Grik3' cluster).
# Cut row dendrograms to obtain the cluster.
cl <- cut_dendro(dendro=mhm.res$rowDendrogram, h=7.149385, target='Grik3'); cl
## [1] "Acsl6" "Ca11" "Cacng7"
## [4] "Cadm3" "Caskin1" "Cdk5r1"
## [7] "Chrm1" "Clstn3" "Cnih2"
## [10] "Dlgap1" "ENSMUSG00000048388" "ENSMUSG00000072769"
## [13] "ENSMUSG00000097451" "Fgf12" "Gnal"
## [16] "Grik3" "Lrrc4c" "Lrrtm2"
## [19] "Mag" "Mast1" "Nfasc"
## [22] "Nrxn1" "Ntm" "Ppfia2"
## [25] "Rasl10b" "Rbfox3" "Reep2"
## [28] "Rims3" "Rnf165" "Rnf227"
## [31] "Scn8a" "Sez6l2" "Stx1b"
## [34] "Tro" "Unc5a" "Wasf1"
# Entrez gene ids of the cluster.
df.entr <- subset(ann.mus, uniprot_gn_symbol %in% cl)
entriz <- unique(df.entr$entrezgene_id); entriz
## [1] 14807 12348 224997 94332 232370 83767 216739 276952 14167
## [10] 20273 18189 14680 12794 56191 225743 52897 107448 269116
## [19] 233878 56216 12669 242662 268932 17136 225362 80515 12569
## [28] 241568 56527 327814 235106 81904 107065
# Export the cluster containing the query gene.
# rownames(df.entr) <- seq_len(nrow(df.entr))
# write.table(df.entr[, c('ensembl_gene_id', 'uniprot_gn_symbol', 'entrezgene_id')], 'cluster_hc.xlsx', col.names=T, row.names=T, sep='\t')
To functionally annotate the obtained gene cluster, KEGG and GO enrichments are performed. The result shows that the 'Grik3' cluster is enriched in genes involved in cell adhesion (Figure 8) and synapse-related (Figure 9) activities, respectively. This agrees well with the fact that 'Grik3' is a well-characterized neurotransmitter receptor in the plasma membrane.
kk <- enrichKEGG(gene = entriz, keyType='kegg', organism = 'mmu', pvalueCutoff = 0.05, pAdjustMethod='BH', qvalueCutoff=0.05)
df.kk <- as.data.frame(kk)
barplot(kk, showCategory=5)
go <- enrichGO(gene = entriz, keyType = "ENTREZID", OrgDb = 'org.Mm.eg.db', ont = 'ALL', pAdjustMethod = 'BH', pvalueCutoff = 0.05, qvalueCutoff = 0.05, readable = TRUE)
##
df.go <- as.data.frame(go)
barplot(go, showCategory=5)
SHM plotting is used to inspect the spatial expression patterns of the genes in the 'Grik3' cluster (Figure 10). The results indicate that the genes of the 'Grik3' cluster are higher expressed in brain than other tissues. In addition, six of these genes are also relatively high expressed in liver tissue. In contrast to this, all other genes of the same cluster are exclusively enriched in brain. For clearity, two representative genes 'Stx1b' and 'Grik3' are shown in Figure 10.
# Aggregate replicates across spatial features by mean.
en.brain <- aggr_rep(en.brain, sam.factor='com.by', con.factor=NULL, aggr='mean')
dat.enrich <- SPHM(svg=svg.mus, bulk=en.brain)
shm(data=dat.enrich, ID=c('Stx1b', 'Grik3'), legend.r=1, legend.nrow=3, sub.title.size=12, ncol=2, bar.width=0.09, lay.shm='con')
## ggplots/grobs: mus_musculus.male.svg ...
## ggplot: Stx1b, con
## ggplot: Grik3, con
## Legend plot ...
## CPU cores: 1
## Converting "ggplot" to "grob" ...
## Stx1b_con_1 Grik3_con_1
## Converting "ggplot" to "grob" ...
##
## Plot layout ...
## Plot layout ...
## Done!
The co-visualization feature integrates tissue with single-cell data by co-visualizing them in composite plots that combine SHMs with embedding plots of high-dimensional data. The resulting spatial context information is important for gaining insights into the tissue-level organization of single cell data or vice versa. The following examples showcase several important functionalities of the co-visualization tool. The data used in these examples includes single-cell (Ortiz et al. 2020) and bulk (Vacher et al. 2021) RNA-seq data from multiple tissues of mouse brain. The tissues represented in both bulk and single cell data are cerebral cortex (isocortex), hippocampus, hypothalamus, and cerebellum.
The bulk and single-cell data of mouse brain are downloaded and imported. Due to long computation time on the real single-cell data, most of the following code in this section are not evaluated when building this vignette.
set.seed(50) # Obtain reproducible results.
# Load help functions.
source('https://raw.githubusercontent.com/jianhaizhang/cocluster_data/master/function/bulk_dat.R')
source('https://raw.githubusercontent.com/jianhaizhang/cocluster_data/master/function/scell_dat.R')
# Download the bulk data file (bulk_mouse_brain.xls) at https://github.com/jianhaizhang/cocluster_data/tree/master/validate/mouse_brain, and then proceed to import the downloaded file.
blk.mus.brain <- blk_dat_mus('bulk_mouse_brain.xls')
blk.mus.brain[1:3, ]
# Download single-cell data files (GSE147747_expr_raw_counts_table.tsv, GSE147747_meta_table.tsv) at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE147747, and then proceed to import the dowloaded files.
sc.mus.brain <- sc_dat_mus_brain(sc.pa= 'GSE147747_expr_raw_counts_table.tsv', meta.pa='GSE147747_meta_table.tsv')
sc.mus.brain[1:3, 1:5]
Import the aSVG of mouse brain.
svg.mus.brain <- read_svg(system.file("extdata/shinyApp/data", "mus_musculus.brain.svg", package="spatialHeatmap"))
tail(svg.mus.brain[, 2][[1]], 3)[, c('feature', 'id')]
## # A tibble: 3 × 2
## feature id
## <chr> <chr>
## 1 hypothalamus UBERON_0001898
## 2 nose UBERON_0000004
## 3 corpora.quadrigemina UBERON_0002259
Convert bulk tissue identifiers to spatial features in the aSVG.
cvt_vecter <- spatialHeatmap:::cvt_vector
colnames(blk.mus.brain) <- unname(cvt_vecter(from=c('CERE', 'HIPP', 'CERE.CORTEX', 'HYPOTHA'), to=c('cerebellum', 'hippocampus', 'cerebral.cortex', 'hypothalamus'), target=colnames(blk.mus.brain)))
blk.mus.brain[1:2, ]
blk <- SummarizedExperiment(assays=list(counts=blk.mus.brain), colData=DataFrame(bulk=colnames(blk.mus.brain)))
Genes in bulk data are filtered according to expression counts \(\ge\) 5 at a proportion of \(\ge\) 0.2 across bulk samples and coefficient of variance between 0.2 and 100.
blk.fil <- filter_data(data=blk, pOA=c(0.2, 5), CV=c(0.2, 100)); dim(blk.fil)
The single cell data are reduced to genes with expression values \(\ge\) 1 (cutoff) across \(\ge\) 7% cells and to cells with expression values \(\ge\) 1 (cutoff) across \(\ge\) 5% genes. After that, bulk and single-data are reduced to common genes between them.
blk.sc <- filter_cell(sce=sc.mus.brain, bulk=blk.fil, cutoff=1, p.in.cell=0.7, p.in.gen=0.5); blk.sc
SpEn is performed across all tissues in the bulk data.
# Separate bulk data from single-cell data.
bulk <- blk.sc$bulk
# All tissues are selected for SpEn.
blk.sub <- sf_var(data=bulk, feature='bulk', ft.sel=colnames(bulk), variable=NULL, var.sel=NULL, com.by='ft')
# SpEn.
enr.res <- spatial_enrich(blk.sub, method=c('edgeR'), norm='TMM', log2.fc=1, fdr=0.05, outliers=0)
# Enrichment results across tissues.
ovl_enrich(enr.res, type='up', plot='upset', font.size=5, venn.arg=list(), order.by='freq')
As an example, the enriched and depleted genes of hypothalamus are queried from the enrichment results.
en.hypo <- query_enrich(enr.res, 'hypothalamus')
# Enriched.
up.hypo <- subset(rowData(en.hypo), type=='up' & total==3)
dim(up.hypo); up.hypo[1:2, 1:3]
# Depleted.
dn.hypo <- subset(rowData(en.hypo), type=='down' & total==3)
dim(dn.hypo); dn.hypo[1:2, 1:3]
Prior to co-visualizing bulk and single-cell data, these two data types are jointly normalized.
blk.sc.nor <- read_cache(cache.pa, 'blk.sc.nor')
if (is.null(blk.sc.nor)) {
blk.sc.nor <- norm_cell(sce=blk.sc$cell, bulk=blk.sc$bulk, com=FALSE)
save_cache(dir=cache.pa, overwrite=TRUE, blk.sc.nor)
}
Bulk and single-cell data are separated, then tissue replicates are aggregated by mean, and dimensionalities are reduced for single-cell data using PCA, UMAP, and TSNE.
# Aggregated tissue replicates.
blk.aggr <- aggr_rep(data=blk.sc.nor$bulk, assay.na='logcounts', sam.factor='sample', aggr='mean')
# Reduce dimensionalities for single-cell data.
cell <- reduce_dim(blk.sc.nor$cell)
Bulk tissues 'cerebral.cortex', 'hypothalamus', 'hippocampus', and 'cerebellum' are matched with corresponding cells labeled 'isocort', 'hypotha', 'hipp', and 'cere', respectively.
match2cell <- list(cerebral.cortex=c('isocort'), hypothalamus=c('hypotha'), hippocampus='hipp', cerebellum='cere')
The spatially enriched (marker) gene 'Slc6a11' in hypothalamus is chosen for co-visualizing the bulk and single-cell data using the 'cell-by-value' coloring, where each cell and tissue is colored independently accoring expression values of 'Slc6a11'. In this example, cells with high expression levels (red in the left embedding plot) can be associated with hypothalamus based on the expression similarity of 'Slc6a11' in the SHM (Figure 11). Additionally, cells from different tissues (known from available cell annotations) with similar expression of 'Slc6a11' may represent the same or similar cell types present in different source tissues (e.g. blood vessels).
dat.sep <- SPHM(svg=svg.mus.brain, bulk=blk.aggr, cell=cell, match=match2cell)
gg <- covis(data=dat.sep, ID='Slc6a11', sam.factor='sample', con.factor=NULL, dimred='TSNE', cell.group='sample', tar.bulk=names(match2cell), bar.width=0.07, dim.lgd.nrow=2, dim.lgd.text.size=12, height=0.7, legend.r=0.2, legend.key.size=0.02, legend.text.size=12, legend.nrow=5, col.idp=TRUE, vari.cell=NULL, h=0.35, dim.axis.font.size=8, sub.title.size=12)
The SpEn analysis of bulk data might fail to detect biomolecules that are only differentially expressed in sub-features of a chosen tissue. The co-visualization functionality can be used to overcome this limitation. Figure 12 shows a co-visualization for gene 'AI413582' that is not detected as spatially enriched in any of the assayed tissues. The embedding plot with 'cell-by-value' coloring reveals that this gene is only selectively expressed in a sub-population of cerebral cortex (isocortex) cells.
Identify non-spatially enriched genes.
en.cort <- query_enrich(enr.res, 'cerebral.cortex')
up.cort <- subset(rowData(en.cort), type=='up' & total==3)
dim(up.cort); up.cort[1:2, 1:3]
# Non-spatially enriched genes.
ids <- setdiff(rownames(blk.aggr), rownames(en.cort)); ids[1:3]
gg <- covis(data=dat.sep, ID='AI413583', sam.factor='sample', con.factor=NULL, dimred='TSNE', cell.group='sample', tar.bulk=names(match2cell), bar.width=0.07, dim.lgd.nrow=2, dim.lgd.text.size=12, height=0.7, legend.r=0.15, legend.key.size=0.02, legend.text.size=12, legend.nrow=5, col.idp=TRUE, vari.cell=NULL, h=0.35, dim.axis.font.size=8, sub.title.size=12)
Spatially resolved single-cell (SRSC) assay data (e.g. RNA-seq) preserve the location of cells within the corresponding source features (e.g. tissues). To plot this data, spatialHeatmap projects (overlays) the spatially resolved single-cell data onto the corresponding features in anatomical images. The resulting plots are referred to as single-cell SHMs (scSHM, Figure 13). The following example demonstrates the co-visualization of SRSC (10X Visium) and bulk (Vacher et al. 2021) RNA-seq data of the sagittal mouse brain, where the latter is from the same study at Figure 11.
The SRSC data are loaded through the Seurat and SeuratData packages (Satija et al. 2015; Butler et al. 2018; Stuart et al. 2019; Hao et al. 2021).
# devtools::install_github('satijalab/seurat-data')
library(Seurat); library(SeuratData);
# InstallData("stxBrain")
set.seed(50)
brain <- LoadData("stxBrain", type = "anterior1")
The bulk and SRSC data are jointly normalized.
nor.lis <- norm_srsc(cell=brain, assay='Spatial', bulk=bulk)
srt.sc <- nor.lis$cell; blk.sp <- nor.lis$bulk
After separated from single-cell data, the tissue replicates in the bulk data are averaged.
blk.aggr.sp <- aggr_rep(data=blk.sp, assay.na='logcounts', sam.factor='sample', aggr='mean')
After separated from bulk data, dimensionalities in SRSC data are reduced, then clustering is performed on the reduced dimensions.
# Reduce dimensionalities.
srt.sc <- RunPCA(srt.sc, assay = "SCT", verbose = FALSE)
srt.sc <- RunUMAP(srt.sc, assay = "SCT", dims = 1:5)
srt.sc <- RunTSNE(srt.sc, assay = "SCT", reduction = "pca", dims = 1:5)
# Cluster cells.
srt.sc <- FindNeighbors(srt.sc, reduction = "pca", dims = 1:30)
srt.sc <- FindClusters(srt.sc, verbose = FALSE)
srt.sc$seurat_clusters <- paste0('clus', srt.sc$seurat_clusters)
The mouse brain aSVG specific to SRSC data is imported. The coordinates of spatial spots in the SRSC data are rotated by 90 degrees to align with the anatomical strucutures in the aSVG.
svg.mus <- read_svg('img/mus_musculus.brain_sp1.svg', srsc=TRUE)
angle(svg.mus)[[1]] <- angle(svg.mus)[[1]] + 90
Different cell shapes are assigned different cell clusters.
grp <- unique(srt.sc$seurat_clusters)
sp <- setNames(c(0, 2:14, 19:25, 32:127)[seq_along(grp)], grp)
sp[c('clus2', 'clus3', 'clus4', 'clus7')] <- 15:18
Figure 13 is the co-visualization of SRSC and bulk data with the 'cell-by-value' coloring. The gene 'Efhd2' is spatially enriched in cerebral cortex, which is detected by SpEn (SHM on the right in Figure 13). The cluster labels are utilized as the cell group labels (cell.group='seurat_clusters'
). The 'cerebral.cortex' tissue aligns anatomically with cell clusters denoted as 'clus3', 'clus4', 'clus7', and 'clus2'. This alignment is defined by a list
called lis.match
. These four clusters are shown in the middle scSHM in Figure 13, where each cell is positioned according to its spatial coordinates. These cells are also presented in the embedding plot (on the left in Figure 13). The grey dots in Figure 13 denote cells belonging to other clusters. The scSHM reveals that 'Efhd2' has high and low expression levels in sub-features of cerebral cortex, respectively. These sub-features roughly form two clusters in the embedding plot. This result may indicate that there are potentially novel cell types present in the cerebral cortex.
# Association between "cerebral.cortex" and clusters.
lis.match <- list(cerebral.cortex=c('clus3', 'clus4', 'clus7', 'clus2'))
dat.srsc <- SPHM(svg=svg.mus, bulk=blk.aggr.sp, cell=srt.sc, match=lis.match)
covis(data=dat.srsc, ID='Efhd2', assay.na='logcounts', dimred='TSNE', cell.group='seurat_clusters', tar.bulk=c('cerebral.cortex'), col.idp=TRUE, bar.width=0.07, dim.lgd.nrow=2, dim.lgd.text.size=8, legend.r=0.1, legend.key.size=0.013, legend.text.size=9, legend.nrow=3, h=0.6, profile=TRUE, ncol=3, vjust=5, dim.lgd.key.size=3, size.r=0.97, dim.axis.font.size=8, size.pt=1.5, shape=sp, lgd.plots.size=c(0.35, 0.25, 0.35), verbose=FALSE)
Many deconvolution algorithms have been developed to infer cellular compositions in bulk data (Newman et al. 2015; Newman et al. 2019; Racle et al. 2017; Gong and Szustakowski 2013; Wang et al. 2019) that can be visualized using the co-visualization module. Specifically, the cell type proportions estimated by the deconvolution methods will be used to calculate an expression matrix from the bulk data where rows are genes and columns are inferred cell types. Next, the cell type expression matrix and the bulk assay data will be used for co-visualization (Figure 14. This utility is showcased using MuSiC (Wang et al. 2019) for deconvoluting bulk RNA-Seq data from human pancreatic islets (Fadista et al. 2014) with matching single cell data (Segerstolpe et al. 2016).
The bulk and single-cell data of human pancreatic islets are downloaded and imported.
library(MuSiC); library(SingleCellExperiment)
set.seed(50)
# Bulk data: https://xuranw.github.io/MuSiC/data/GSE50244bulkeset.rds
blk.eset <- readRDS('data/GSE50244bulkeset.rds')
blk.de <- exprs(blk.eset); blk.de[1:2, 1:3]
## Sub1 Sub2 Sub3
## DDX11L1 0 0 1
## WASH7P 31 14 21
# Single-cell data: https://xuranw.github.io/MuSiC/data/EMTABsce_healthy.rds
sc.de <- readRDS('data/EMTABsce_healthy.rds'); colData(sc.de)[1:2, ]
## DataFrame with 2 rows and 4 columns
## sampleID SubjectName cellTypeID cellType
## <numeric> <factor> <numeric> <factor>
## AZ_A10 1 Non T2D 1 5 delta
## AZ_A11 1 Non T2D 1 2 alpha
Bulk deconvolutio is performed with MuSiC (Wang et al. 2019).
# Deconvolution.
res.p <- music_prop(bulk.mtx = blk.de, sc.sce = sc.de, clusters = 'cellType', samples = 'sampleID', select.ct = c('alpha', 'beta', 'delta', 'gamma', 'acinar', 'ductal'), verbose = FALSE)
names(res.p)
## [1] "Est.prop.weighted" "Est.prop.allgene" "Weight.gene"
## [4] "r.squared.full" "Var.prop"
# Deconvolution results.
prop <- res.p$Est.prop.weighted; prop[1:3, ]
## alpha beta delta gamma acinar
## Sub1 0.2079168 0.1746030 0.002203054 0.0015887041 0.09597685
## Sub2 0.6964173 0.1197928 0.003683287 0.0007471486 0.03078049
## Sub3 0.1913485 0.3521117 0.032037770 0.0003117540 0.19143687
## ductal
## Sub1 0.5177117
## Sub2 0.1485789
## Sub3 0.2327533
By using the inferred cell type proportions, the expression matrix of inferred cell types are calculated from the bulk data of one subject (Sub1). The chosen bulk data are filtered according to expression counts \(\ge\) 5. The expression matrix of inferred cell types is stored in a SingleCellExperiment object.
# Filter the chosen bulk data.
blk.sel <- subset(data.frame(blk.de), Sub1 >= 5)[, 'Sub1', drop=FALSE]
# Calculat expression matrix of inferred cell types.
res.de <- lapply(X=seq_len(ncol(blk.sel)), bulk=blk.sel, FUN=function(i, bulk) {
lis0 <- lapply(seq_len(nrow(bulk)), function(r) {
round(data.frame(bulk)[r, i, drop=TRUE] * prop[i, ])
}); dat <- do.call('rbind', lis0)
rownames(dat) <- rownames(bulk); dat
})
# Store the expression matrix of cell types in a SingleCellExperiment object.
res.sc <- res.de[[1]]; res.sc[1:2, ]
## alpha beta delta gamma acinar ductal
## WASH7P 6 5 0 0 3 16
## AL627309.1 6 5 0 0 3 15
res.sc <- SingleCellExperiment(list(counts=res.sc))
Filter the cell-type expression matrix, jointly normalize the cell and bulk data, and separate the two data types.
# Filter cell-type expression data.
blk.sc.de <- filter_cell(sce=res.sc, bulk=blk.sel, cutoff=1, p.in.cell=0.1, p.in.gen=0.5)
# Jointly normalize cell and bulk data.
blk.se.de.nor <- norm_data(data=cbind(blk.sc.de$bulk, blk.sc.de$cell), norm.fun='CNF', log2.trans=TRUE)
assayNames(blk.se.de.nor) <- 'logcounts'
# Separate bulk from cell data.
blk.de.nor <- blk.se.de.nor[, seq_len(ncol(blk.sc.de$bulk))]
colnames(blk.de.nor) <- 'pancreas.islet'
sc.de.nor <- blk.se.de.nor[, seq_len(ncol(blk.sc.de$cell)) + ncol(blk.de.nor)]
Include essential information in the colData
of the cell-type data, such as the corresponding bulk tissue, cell types, and cell type proportions. After that reduce dimensionalities for the cell-type data.
# Include essential information
sc.de.nor$bulk <- 'pancreas.islet'
sc.de.nor$cell.type <- colnames(sc.de.nor)
sc.de.nor$prop <- round(prop['Sub1', sc.de.nor$cell.type], 3)
# Assign unique shapes to each cell type.
cell.all <- unique(sc.de.dim$cell.type)
sp <- setNames(c(15, 17:19, 12, 8), cell.all)
# Reduce dimensionalities.
sc.de.dim <- runTSNE(sc.de.nor)
## Warning: useNames = NA is deprecated. Instead, specify either
## useNames = TRUE or useNames = TRUE.
Import the aSVG of human pancreatic islets.
islet <- read_svg('img/pancreas.svg')
Figure 14 shows co-visualization of bulk and inferred cell-type data of human pancreatic islets for gene 'AP006222.2', where 'cell-by-value' coloring is used. The estimated cellular proportions are shown as decimal numbers next to each cell type. The result indicates that pancreatic islets consist of two rare (yellow) and four abundant cell types, and that gene 'AP006222.2' is selectively expressed in the six cell types.
dat.decon <- SPHM(svg=islet, bulk=blk.de.nor, cell=sc.de.dim)
covis(data=dat.decon, ID='AP006222.2', dimred='TSNE', tar.cell=unique(sc.de.dim$cell.type), cell.group='cell.type', dim.lgd.text.size=10, dim.lgd.key.size=5, dim.lgd.nrow=2, bar.width=0.08, legend.nrow=1, h=0.5, col.idp=TRUE, legend.r=0.2, profile=TRUE, decon=TRUE, size.pt=5, size.lab.pt=3, vjust.lab.pt=0, hjust.lab.pt=0.5, shape=sp, verbose=FALSE)
## Converting "ggplot" to "grob" ...
## dim_AP006222.2_con_1
## Converting "ggplot" to "grob" ...
##
sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets
## [7] methods base
##
## other attached packages:
## [1] knitr_1.43 stxBrain.SeuratData_0.1.1
## [3] SeuratData_0.2.2 SeuratObject_4.1.3
## [5] Seurat_4.3.0.1 cowplot_1.1.1
## [7] reshape2_1.4.4 scater_1.26.1
## [9] scuttle_1.8.4 SingleCellExperiment_1.20.1
## [11] MuSiC_1.0.0 TOAST_1.12.0
## [13] quadprog_1.5-8 EpiDISH_2.14.1
## [15] nnls_1.4 clusterProfiler_4.6.2
## [17] enrichplot_1.18.4 biomaRt_2.54.1
## [19] ggplot2_3.4.2 ExpressionAtlas_1.26.0
## [21] BiocStyle_2.26.0 jsonlite_1.8.7
## [23] RCurl_1.98-1.12 xml2_1.3.5
## [25] limma_3.54.2 SummarizedExperiment_1.28.0
## [27] Biobase_2.58.0 GenomicRanges_1.50.2
## [29] GenomeInfoDb_1.34.9 IRanges_2.32.0
## [31] S4Vectors_0.36.2 BiocGenerics_0.44.0
## [33] MatrixGenerics_1.10.0 matrixStats_1.0.0
## [35] spatialHeatmap_2.7.6 nvimcom_0.9-92
##
## loaded via a namespace (and not attached):
## [1] rappdirs_0.3.3 SparseM_1.81
## [3] scattermore_1.2 GGally_2.1.2
## [5] coda_0.19-4 tidyr_1.3.0
## [7] bit64_4.0.5 irlba_2.3.5.1
## [9] DelayedArray_0.24.0 data.table_1.14.8
## [11] KEGGREST_1.38.0 doParallel_1.0.17
## [13] generics_0.1.3 ScaledMatrix_1.6.0
## [15] RSQLite_2.3.1 shadowtext_0.1.2
## [17] RANN_2.6.1 future_1.33.0
## [19] proxy_0.4-27 shinytoastr_2.1.1
## [21] bit_4.0.5 spatstat.data_3.0-1
## [23] httpuv_1.6.11 assertthat_0.2.1
## [25] viridis_0.6.4 xfun_0.39
## [27] hms_1.1.3 jquerylib_0.1.4
## [29] evaluate_0.21 promises_1.2.0.1
## [31] fansi_1.0.4 progress_1.2.2
## [33] caTools_1.18.2 dbplyr_2.3.3
## [35] htmlwidgets_1.6.2 igraph_1.5.0.1
## [37] DBI_1.1.3 mcmc_0.9-7
## [39] reshape_0.8.9 spatstat.geom_3.2-4
## [41] purrr_1.0.1 ellipsis_0.3.2
## [43] dplyr_1.1.2 bookdown_0.34
## [45] annotate_1.76.0 MCMCpack_1.6-3
## [47] deldir_1.0-9 locfdr_1.1-8
## [49] sparseMatrixStats_1.10.0 vctrs_0.6.3
## [51] quantreg_5.96 ROCR_1.0-11
## [53] abind_1.4-5 cachem_1.0.8
## [55] withr_2.5.0 ggforce_0.4.1
## [57] HDO.db_0.99.1 progressr_0.13.0
## [59] sctransform_0.3.5 treeio_1.22.0
## [61] prettyunits_1.1.1 scran_1.26.2
## [63] goftest_1.2-3 cluster_2.1.4
## [65] DOSE_3.24.2 ape_5.7-1
## [67] grImport_0.9-7 lazyeval_0.2.2
## [69] crayon_1.5.2 spatstat.explore_3.2-1
## [71] genefilter_1.80.3 edgeR_3.40.2
## [73] pkgconfig_2.0.3 labeling_0.4.2
## [75] tweenr_2.0.2 nlme_3.1-162
## [77] vipor_0.4.5 globals_0.16.2
## [79] rlang_1.1.1 miniUI_0.1.1.1
## [81] lifecycle_1.0.3 MatrixModels_0.5-2
## [83] downloader_0.4 filelock_1.0.2
## [85] BiocFileCache_2.6.1 rsvd_1.0.5
## [87] rsvg_2.4.0 polyclip_1.10-4
## [89] lmtest_0.9-40 Matrix_1.6-0
## [91] aplot_0.1.10 zoo_1.8-12
## [93] beeswarm_0.4.0 ggridges_0.5.4
## [95] png_0.1-8 viridisLite_0.4.2
## [97] bitops_1.0-7 shinydashboard_0.7.2
## [99] gson_0.1.0 KernSmooth_2.23-22
## [101] Biostrings_2.66.0 blob_1.2.4
## [103] DelayedMatrixStats_1.20.0 stringr_1.5.0
## [105] qvalue_2.30.0 spatstat.random_3.1-5
## [107] parallelly_1.36.0 gridGraphics_0.5-1
## [109] shinyAce_0.4.2 beachmat_2.14.2
## [111] scales_1.2.1 ica_1.0-3
## [113] memoise_2.0.1 magrittr_2.0.3
## [115] plyr_1.8.8 gplots_3.1.3
## [117] zlibbioc_1.44.0 compiler_4.2.0
## [119] scatterpie_0.2.1 dqrng_0.3.0
## [121] RColorBrewer_1.1-3 fitdistrplus_1.1-11
## [123] cli_3.6.1 XVector_0.38.0
## [125] listenv_0.9.0 pbapply_1.7-2
## [127] patchwork_1.1.2 spsComps_0.3.3.0
## [129] MASS_7.3-60 tidyselect_1.2.0
## [131] stringi_1.7.12 highr_0.10
## [133] yaml_2.3.7 GOSemSim_2.24.0
## [135] BiocSingular_1.14.0 locfit_1.5-9.8
## [137] ggrepel_0.9.3 grid_4.2.0
## [139] sass_0.4.7 fastmatch_1.1-3
## [141] tools_4.2.0 future.apply_1.11.0
## [143] parallel_4.2.0 bluster_1.8.0
## [145] foreach_1.5.2 metapod_1.6.0
## [147] gridExtra_2.3 farver_2.1.1
## [149] Rtsne_0.16 ggraph_2.1.0
## [151] digest_0.6.33 BiocManager_1.30.21.1
## [153] shiny_1.7.4.1 Rcpp_1.0.11
## [155] later_1.3.1 RcppAnnoy_0.0.21
## [157] httr_1.4.6 AnnotationDbi_1.60.2
## [159] colorspace_2.1-0 tensor_1.5
## [161] reticulate_1.30 XML_3.99-0.14
## [163] splines_4.2.0 uwot_0.1.16
## [165] yulab.utils_0.0.6 statmod_1.5.0
## [167] spatstat.utils_3.0-3 tidytree_0.4.4
## [169] graphlayouts_1.0.0 sp_2.0-0
## [171] ggplotify_0.1.1 plotly_4.10.2
## [173] xtable_1.8-4 ggtree_3.6.2
## [175] tidygraph_1.2.3 corpcor_1.6.10
## [177] ggfun_0.1.1 R6_2.5.1
## [179] pillar_1.9.0 htmltools_0.5.5
## [181] mime_0.12 glue_1.6.2
## [183] fastmap_1.1.1 BiocParallel_1.32.6
## [185] BiocNeighbors_1.16.0 class_7.3-22
## [187] codetools_0.2-19 fgsea_1.24.0
## [189] utf8_1.2.3 spatstat.sparse_3.0-2
## [191] lattice_0.21-8 bslib_0.5.0
## [193] tibble_3.2.1 curl_5.0.1
## [195] ggbeeswarm_0.7.2 leiden_0.4.3
## [197] gtools_3.9.4 magick_2.7.4
## [199] GO.db_3.16.0 survival_3.5-5
## [201] rmarkdown_2.23 munsell_0.5.0
## [203] e1071_1.7-13 GenomeInfoDbData_1.2.9
## [205] iterators_1.0.14 gtable_0.3.3
Butler, Andrew, Paul Hoffman, Peter Smibert, Efthymia Papalexi, and Rahul Satija. 2018. “Integrating Single-Cell Transcriptomic Data Across Different Conditions, Technologies, and Species.” Nature Biotechnology 36: 411–20. doi:10.1038/nbt.4096.
Fadista, João, Petter Vikman, Emilia Ottosson Laakso, Inês Guerra Mollet, Jonathan Lou Esguerra, Jalal Taneera, Petter Storm, et al. 2014. “Global Genomic and Transcriptomic Analysis of Human Pancreatic Islets Reveals Novel Genes Influencing Glucose Metabolism.” Proc. Natl. Acad. Sci. U. S. A. 111 (38): 13924–9.
Gong, Ting, and Joseph D Szustakowski. 2013. “DeconRNASeq: A Statistical Framework for Deconvolution of Heterogeneous Tissue Samples Based on mRNA-Seq Data.” Bioinformatics 29 (8): 1083–5.
Hao, Yuhan, Stephanie Hao, Erica Andersen-Nissen, William M. Mauck III, Shiwei Zheng, Andrew Butler, Maddie J. Lee, et al. 2021. “Integrated Analysis of Multimodal Single-Cell Data.” Cell. doi:10.1016/j.cell.2021.04.048.
Knebel, Birgit, Sonja Hartwig, Jutta Haas, Stefan Lehr, Simon Goeddeke, Franciscus Susanto, Lothar Bohne, et al. 2015. “Peroxisomes Compensate Hepatic Lipid Overflow in Mice with Fatty Liver.” Biochim. Biophys. Acta 1851 (7): 965–76.
Merkin, Jason, Caitlin Russell, Ping Chen, and Christopher B Burge. 2012. “Evolutionary Dynamics of Gene and Isoform Regulation in Mammalian Tissues.” Science 338 (6114): 1593–9.
Newman, Aaron M, Chih Long Liu, Michael R Green, Andrew J Gentles, Weiguo Feng, Yue Xu, Chuong D Hoang, Maximilian Diehn, and Ash A Alizadeh. 2015. “Robust Enumeration of Cell Subsets from Tissue Expression Profiles.” Nat. Methods 12 (5): 453–57.
Newman, Aaron M, Chloé B Steen, Chih Long Liu, Andrew J Gentles, Aadel A Chaudhuri, Florian Scherer, Michael S Khodadoust, et al. 2019. “Determining Cell Type Abundance and Expression from Bulk Tissues with Digital Cytometry.” Nat. Biotechnol. 37 (7). Nature Publishing Group: 773–82.
Ortiz, Cantin, Jose Fernandez Navarro, Aleksandra Jurek, Antje Märtin, Joakim Lundeberg, and Konstantinos Meletis. 2020. “Molecular Atlas of the Adult Mouse Brain.” Science Advances 6 (26). American Association for the Advancement of Science: eabb3446.
Racle, Julien, Kaat de Jonge, Petra Baumgaertner, Daniel E Speiser, and David Gfeller. 2017. “Simultaneous Enumeration of Cancer and Immune Cell Types from Bulk Tumor Gene Expression Data.” Elife 6 (November). eLife Sciences Publications, Ltd: e26476.
Satija, Rahul, Jeffrey A Farrell, David Gennert, Alexander F Schier, and Aviv Regev. 2015. “Spatial Reconstruction of Single-Cell Gene Expression Data.” Nature Biotechnology 33: 495–502. doi:10.1038/nbt.3192.
Segerstolpe, Åsa, Athanasia Palasantza, Pernilla Eliasson, Eva-Marie Andersson, Anne-Christine Andréasson, Xiaoyan Sun, Simone Picelli, et al. 2016. “Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes.” Cell Metab. 24 (4): 593–607.
Stepulak, Andrzej, Hella Luksch, Christine Gebhardt, Ortrud Uckermann, Jenny Marzahn, Marco Sifringer, Wojciech Rzeski, et al. 2009. “Expression of Glutamate Receptor Subunits in Human Cancers.” Histochem. Cell Biol. 132 (4): 435–45.
Stuart, Tim, Andrew Butler, Paul Hoffman, Christoph Hafemeister, Efthymia Papalexi, William M Mauck III, Yuhan Hao, Marlon Stoeckius, Peter Smibert, and Rahul Satija. 2019. “Comprehensive Integration of Single-Cell Data.” Cell 177: 1888–1902. doi:10.1016/j.cell.2019.05.031.
Thorgersen, Ebbe Billmann, Andreas Barratt-Due, Håkon Haugaa, Morten Harboe, Søren Erik Pischke, Per H Nilsson, and Tom Eirik Mollnes. 2019. “The Role of Complement in Liver Injury, Regeneration, and Transplantation.” Hepatology 70 (2): 725–36.
Vacher, Claire-Marie, Helene Lacaille, Jiaqi J O’Reilly, Jacquelyn Salzbank, Dana Bakalar, Sonia Sebaoui, Philippe Liere, et al. 2021. “Placental Endocrine Function Shapes Cerebellar Development and Social Behavior.” Nat. Neurosci. 24 (10). Nature Publishing Group: 1392–1401.
Wang, Xuran, Jihwan Park, Katalin Susztak, Nancy R Zhang, and Mingyao Li. 2019. “Bulk Tissue Cell Type Deconvolution with Multi-Subject Single-Cell Expression Reference.” Nat. Commun. 10 (1). Nature Publishing Group: 1–9.
Xiao, Bin, Zhenzhan Kuang, Weiyun Zhang, Jianfeng Hang, Lidan Chen, Ting Lei, Yongyin He, et al. 2019. “Glutamate Ionotropic Receptor Kainate Type Subunit 3 (Grik3) Promotes Epithelial-Mesenchymal Transition in Breast Cancer Cells by Regulating SPDEF/Cdh1 Signaling.” Mol. Carcinog. 58 (7): 1314–23.