Chapter 2 Basic usage

Here we show a detailed documentation usage about ClusterGVis.

2.1 Input data

For bulk-RNA sequencing data, a normalized matrix or data frame(tpm/fpkm/rpkm/rpm/…) containing gene expressions can be accepted. Please make sure the rownames of expression matrix are gene names. Here we load an example data which is from Protein Expression Landscape of Mouse Embryos during Pre-implantation Development.

First we load the example data:

library(ClusterGVis)

# load data
data(exps)

# check
head(exps,3)
#           zygote  t2.cell  t4.cell  t8.cell   tmorula blastocyst
# Oog4   1.3132282 1.237078 1.325978 1.262073 0.6549312  0.2067114
# Psmd9  1.0917337 1.315989 1.174417 1.064756 0.8685598  0.4845448
# Sephs2 0.9859232 1.201026 1.123076 1.084673 0.8878931  0.7174088

2.2 Clustering for expression matrix

It is a problem to define a suitable cluster numbers, here we supply the total within sum square value for different cluster numbers to help you to choose a suitable value. Besides you can re-define the cluster numbers after checking the expression distribution heatmap results:

# check optimal cluster numbers
getClusters(obj = exps)

The kmeans, mfuzz and TCseq methods can be choosen to cluster data. mfuzz and TCseq are suitable for time course sequencing data to detect genes with consensus trend which are both applied with fuzzy cmeans clustering methods. Users can choose which one you like.

# using mfuzz for clustering
cm <- clusterData(obj = exps,
                  cluster.method = "mfuzz",
                  cluster.num = 8)

# using TCseq for clustering
ct <- clusterData(obj = exps,
                  cluster.method = "TCseq",
                  cluster.num = 8)

# using kemans for clustering
ck <- clusterData(obj = exps,
                  cluster.method = "kmeans",
                  cluster.num = 8)

The clusterData function returns a list of results containing detailed clustering information for each gene. The wide.res is a more easily interpretable data frame that includes scaled gene expression values and cluster assignments for each gene. Complementing this, the long.res is a long-format data frame derived from wide.res, offering an alternative view of the same information:

str(cm)
# List of 5
# $ wide.res:'data.frame':  3767 obs. of  9 variables:
#   ..$ gene      : chr [1:3767] "0610037L13Rik" "1110020G09Rik" "1110065P20Rik" "2010106G01Rik" ...
# ..$ zygote    : num [1:3767] 0.255 -1.193 0.477 0.537 -1.395 ...
# ..$ t2.cell   : num [1:3767] 0.608 0.347 0.121 0.15 0.387 ...
# ..$ t4.cell   : num [1:3767] 0.708 0.113 -0.114 0.588 0.516 ...
# ..$ t8.cell   : num [1:3767] 0.589 1.418 1.272 0.888 1.383 ...
# ..$ tmorula   : num [1:3767] -0.2467 0.4319 0.0123 -0.3015 -0.8505 ...
# ..$ blastocyst: num [1:3767] -1.9124 -1.1167 -1.7682 -1.8611 -0.0408 ...
# ..$ cluster   : num [1:3767] 1 1 1 1 1 1 1 1 1 1 ...
# ..$ membership: num [1:3767] 0.765 0.437 0.581 0.66 0.206 ...
# $ long.res:'data.frame':  22602 obs. of  6 variables:
#   ..$ cluster     : num [1:22602] 1 1 1 1 1 1 1 1 1 1 ...
# ..$ gene        : chr [1:22602] "0610037L13Rik" "1110020G09Rik" "1110065P20Rik" "2010106G01Rik" ...
# ..$ membership  : num [1:22602] 0.765 0.437 0.581 0.66 0.206 ...
# ..$ cell_type   : Factor w/ 6 levels "zygote","t2.cell",..: 1 1 1 1 1 1 1 1 1 1 ...
# ..$ norm_value  : num [1:22602] 0.255 -1.193 0.477 0.537 -1.395 ...
# ..$ cluster_name: Factor w/ 8 levels "cluster 1 (381)",..: 1 1 1 1 1 1 1 1 1 1 ...
# $ type    : chr "mfuzz"
# $ geneMode: chr "none"
# $ geneType: chr "none"


str(ck)
# List of 5
# $ wide.res:'data.frame':  3767 obs. of  8 variables:
#   ..$ zygote    : num [1:3767] -1.189 -0.993 -1.095 -0.772 -0.683 ...
# ..$ t2.cell   : num [1:3767] 1.177 0.936 1.027 1.075 1.11 ...
# ..$ t4.cell   : num [1:3767] 0.126 0.561 0.484 0.397 0.337 ...
# ..$ t8.cell   : num [1:3767] 0.1831 0.2203 0.0516 -0.1314 -0.0698 ...
# ..$ tmorula   : num [1:3767] -1.17 -1.48 -1.33 -1.49 -1.56 ...
# ..$ blastocyst: num [1:3767] 0.878 0.758 0.861 0.923 0.865 ...
# ..$ gene      : chr [1:3767] "Pdlim1" "Zp3" "Crk" "Ndufaf4" ...
# ..$ cluster   : num [1:3767] 1 1 1 1 1 1 1 1 1 1 ...
# $ long.res:'data.frame':  22602 obs. of  5 variables:
#   ..$ cluster     : num [1:22602] 1 1 1 1 1 1 1 1 1 1 ...
# ..$ gene        : chr [1:22602] "Pdlim1" "Zp3" "Crk" "Ndufaf4" ...
# ..$ cell_type   : Factor w/ 6 levels "zygote","t2.cell",..: 1 1 1 1 1 1 1 1 1 1 ...
# ..$ norm_value  : num [1:22602] -1.189 -0.993 -1.095 -0.772 -0.683 ...
# ..$ cluster_name: Factor w/ 8 levels "cluster 1 (134)",..: 1 1 1 1 1 1 1 1 1 1 ...
# $ type    : chr "kmeans"
# $ geneMode: chr "none"
# $ geneType: chr "none"

2.3 Line plot

visCluster accepts results from clusterData and do visualization with line plot:

# plot line only
visCluster(object = cm,
           plot.type = "line")

Change linecolor:

# change color
visCluster(object = cm,
           plot.type = "line",
           ms.col = c("green","orange","red"))

Remove the middle line:

# remove meadian line
visCluster(object = cm,
           plot.type = "line",
           ms.col = c("green","orange","red"),
           add.mline = FALSE)

There is no membership information if you choose kmeans method:

# plot line only with kmeans method
visCluster(object = ck,
           plot.type = "line")

2.4 Heatmap plot

visCluster can create a comprehensive heatmap plot to show detailed cluster results. Here we make a heatmap plot:

# plot heatmap only
visCluster(object = ck,
           plot.type = "heatmap")

Other paramters can be passed with ComplexHeatmap:

# supply other aruguments passed by Heatmap function
visCluster(object = ck,
           plot.type = "heatmap",
           column_names_rot = 45)

Change the annotaion bar colors on the right side:

# change anno bar color
visCluster(object = ck,
           plot.type = "heatmap",
           column_names_rot = 45,
           ctAnno.col = ggsci::pal_npg()(8))

Mark your interested gene names in heatmap:

# add gene name
markGenes = rownames(exps)[sample(1:nrow(exps),30,replace = F)]

pdf('addgene.pdf',height = 10,width = 6,onefile = F)
visCluster(object = ck,
           plot.type = "heatmap",
           column_names_rot = 45,
           markGenes = markGenes)
dev.off()

2.5 Sample group and order

Add groups for samples:

# assign groups
pdf('htcolg.pdf',height = 10,width = 6,onefile = F)
visCluster(object = cm,
           plot.type = "heatmap",
           column_names_rot = 45,
           markGenes = markGenes,
           show_row_dend = F,
           sample.group = rep(c("group1","group2","group3"),each = 2))
dev.off()

Change sample orders:

# change sample order
pdf('htsr.pdf',height = 10,width = 6,onefile = F)
visCluster(object = cm,
           plot.type = "heatmap",
           column_names_rot = 45,
           markGenes = markGenes,
           show_row_dend = F,
           sample.order = rev(colnames(exps)))
dev.off()

2.6 Multiple sample annotation

Multiple group annotations for samples can be passed with ComplexHeatmap grammers. Here we show an example:

# group info
mg1 = rep(c("D1","D2"), each = 3)
names(mg1) <- c("zygote","t2.cell","t4.cell","t8.cell","tmorula","blastocyst")

mg2 = rep(c("E1", "E2", "E3"), each = 2)
names(mg2) <- c("zygote","t2.cell","t4.cell","t8.cell","tmorula","blastocyst")

# TOP annotations
HeatmapAnnotation <- ComplexHeatmap::HeatmapAnnotation(mg1 = mg1,
                                                       mg2 = mg2,
                                                       col = list(mg1 = c("D1" = "#C147E9","D2" = "#FF7000"),
                                                                  mg2 = c("E1" = "#54B435","E2" = "#31C6D4","E3" = "#D9CB50")),
                                                       gp = grid::gpar(col = "white"))


pdf('htcolmg.pdf',height = 10,width = 6,onefile = F)
visCluster(object = cm,
           plot.type = "heatmap",
           column_names_rot = 45,
           markGenes = markGenes,
           show_row_dend = F,
           ht.col = list(col_range = c(-2, 0, 2),col_color = c("#A555EC","#EEEEEE","#FF597B")),
           HeatmapAnnotation = HeatmapAnnotation)
dev.off()

Split the columns:

pdf('htcolmgs1.pdf',height = 10,width = 6,onefile = F)
visCluster(object = cm,
           plot.type = "heatmap",
           column_names_rot = 45,
           markGenes = markGenes,
           show_row_dend = F,
           ht.col = list(col_range = c(-2, 0, 2),col_color = c("#A555EC","#EEEEEE","#FF597B")),
           HeatmapAnnotation = HeatmapAnnotation,
           column.split = rep(c(1,2,3),each = 2))
dev.off()

2.7 Add row annotation

To add row annotations to the main heatmap body, use the row_annotation_obj parameter. When plot.type = “heatmap” or plot.type = “both” and line.side = “right” , you should provide a ComplexHeatmap::rowAnnotation object. Otherwise, a named list of vectors is accepted. (Note: Ensure that the gene order matches the clustered order. )

We plot a samll heatmap:

library(ClusterGVis)
library(ComplexHeatmap)

# load data
data(exps)

# using TCseq for clustering
ct <- clusterData(obj = exps[1:20,],
                  cluster.method = "TCseq",
                  cluster.num = 2)


visCluster(object = ct,
           plot.type = "heatmap",
           column_names_rot = 45)

We first arrange the genes for cluster orders and create a named vectors:

df <-  ct$wide.res %>% dplyr::arrange(cluster)

l_ano <- rep(LETTERS[1:3],c(6,4,10))
names(l_ano) <- df$gene
l_ano
# Nhlrc2       Trappc4         Ywhah       Ppp2r2d         Kpnb1           Ckb          Gpx1
# "A"           "A"           "A"           "A"           "A"           "A"           "B"
# Qdpr          Oog4         Psmd9        Sephs2        Dctpp1         Ciao1         Oas1f
# "B"           "B"           "B"           "C"           "C"           "C"           "C"
# Scrn2          Arl6       Fam188a C630004H02Rik       Tmem135          Ipo9
# "C"           "C"           "C"           "C"           "C"           "C"

Add row annotations for genes:

visCluster(object = ct,
           plot.type = "heatmap",
           column_names_rot = 45,
           show_row_names = T,
           row_annotation_obj = ComplexHeatmap::rowAnnotation(anno = l_ano,
                                                              hh = anno_text(LETTERS[1:20]),
                                                              nm = anno_barplot(runif(20)))
)

Or try this way:

row_annotation_obj <- list(anno = l_ano,
                           hh = anno_text(LETTERS[1:20],which = "row"),
                           nm = anno_barplot(runif(20),which = "row"))

left_annotation <- do.call(ComplexHeatmap::rowAnnotation,
                           modifyList(list(),
                                      row_annotation_obj))

visCluster(object = ct,
           plot.type = "heatmap",
           column_names_rot = 45,
           show_row_names = T,
           row_annotation_obj = left_annotation)

Add line annotation:

pdf('addgene.pdf',height = 6,width = 6,onefile = F)
visCluster(object = ct,
           plot.type = "both",
           column_names_rot = 45,
           row_annotation_obj = left_annotation)
dev.off()

Given a named list of vectors if line.side = “left”:

pdf('addgene2.pdf',height = 6,width = 6,onefile = F)
visCluster(object = ct,
           plot.type = "both",
           column_names_rot = 45,
           row_annotation_obj = row_annotation_obj,
           line.side = "left")
dev.off()

2.8 Add line trend anntation

Add a line annotation for each sub-cluster to show expression tendency(Please note: use pdf function to save the plot and open it to view, otherwise the annotation panels will not match to the main heatmap if you use zoom option to view the plot in Rstudio):

# add line annotation
pdf('testHT.pdf',height = 10,width = 6)
visCluster(object = ck,
           plot.type = "both",
           column_names_rot = 45)
dev.off()

Add boxplot:

# add boxplot
pdf('testbx.pdf',height = 10,width = 6)
visCluster(object = ck,
           plot.type = "both",
           column_names_rot = 45,
           add.box = T)
dev.off()

Remove line plot and change boxplot color:

# remove line and change box fill color
pdf('testbxcol.pdf',height = 10,width = 6)
visCluster(object = ck,
           plot.type = "both",
           column_names_rot = 45,
           add.box = T,
           add.line = F,
           boxcol = ggsci::pal_npg()(8))
dev.off()

Add points:

# add point
pdf('testbxcolP.pdf',height = 10,width = 6)
visCluster(object = ck,
           plot.type = "both",
           column_names_rot = 45,
           add.box = T,
           add.line = F,
           boxcol = ggsci::pal_npg()(8),
           add.point = T)
dev.off()

2.9 Multiple group line plot

Sometimes maybe there is a need to show line annotation across different conditions. This is can be achieved by mulGroup parameter:

# multiple groups
pdf('termlftsmp.pdf',height = 10,width = 11,onefile = F)
visCluster(object = ck,
           plot.type = "both",
           column_names_rot = 45,
           show_row_dend = F,
           markGenes = markGenes,
           markGenes.side = "left",
           genes.gp = c('italic',fontsize = 12,col = "black"),
           annoTerm.data = termanno2,
           line.side = "left",
           go.col = rep(ggsci::pal_d3()(8),each = 3),
           go.size = "pval",
           mulGroup = c(2,2,2),
           mline.col = c(ggsci::pal_lancet()(3)))
dev.off()