Global gene expression analysis

Intro

In addition to analyzing ribosome profiling (Ribo-seq) data from the perspective of codon usage, amino acid composition, or other sequence-based features, a key focus in translational research lies in assessing changes in gene-level expression and translational regulation. Specifically, researchers are often interested in how translational efficiency (TE) and mRNA abundance vary across different conditions.

This section introduces how to use the riboTransVis toolkit for global RNA and RPF (Ribosome-Protected Fragment) quantification, differential expression analysis, and calculation of translational efficiency. By integrating Ribo-seq and RNA-seq data, riboTransVis facilitates systematic identification of genes with altered protein synthesis, enabling insights into translational control mechanisms that cannot be captured by transcriptome data alone.

Through this approach, we aim to provide a comprehensive framework for analyzing genome-wide translational changes, which can support functional discovery and reveal phenomena such as selective translation, stress response adaptation, and oncogenic translation reprogramming.

Quantify gene expression with counts

The get_counts() function utilizes the featureCounts algorithm from the Rsubread package to efficiently quantify gene-level expression from RNA-seq and Ribo-seq data. By default, it performs read counting over exon features for RNA-seq and coding sequence (CDS) features for RPF (ribosome-protected fragments) data. Users can explicitly specify the annotation features used for quantification via the options: rna_feature = “exon” and ribo_feature = “CDS”.

Additionally, if the BAM files are aligned to the genome, Rsubread’s featureCounts is used for quantification. If instead the BAM files are aligned to the transcriptome, the function directly obtains read counts from the BAM files themselves.

The resulting count information is stored in the counts slot of the input object:

obj <- get_counts(obj, nThreads = 15)

# check
str(obj@counts)
# List of 2
# $ rpf:'data.frame':   14647 obs. of  4 variables:
#   ..$ wt1: num [1:14647] 1 15 482 616 239 935 262 101 647 274 ...
# ..$ wt2: num [1:14647] 1 15 482 616 239 935 262 101 647 274 ...
# ..$ ko1: num [1:14647] 0 43 491 783 323 ...
# ..$ ko2: num [1:14647] 0 21 280 464 90 713 232 64 543 290 ...
# $ rna:'data.frame':   57169 obs. of  4 variables:
#   ..$ wt1: num [1:57169] 6 1 2 0 0 2 5 7 0 0 ...
# ..$ wt2: num [1:57169] 6 1 2 0 0 2 5 7 0 0 ...
# ..$ ko1: num [1:57169] 6 0 1 0 0 1 6 9 0 0 ...
# ..$ ko2: num [1:57169] 3 1 0 0 0 1 7 1 0 0 ...

Correlation plot

Plot the correlation for the replicates based on counts data:

rna <- obj@counts$rna

cor_plot(data = rna,x = "ko1",y = "ko2")

Extract normalized expression

The function get_normalized_reads calculates RNA, RPF, and translation efficiency (TE). The normalized values for RNA and RPF are standardized using TPM values, while translation efficiency (TE) is defined as:

\[ \text{TE (translation efficiency)} = \frac{\text{RPF}_tpm}{\text{RNA}_tpm} \tag{1} \]

norm <- get_normalized_reads(object = obj,
                             type = c("rna", "ribo", "te"))

# check
str(norm)
# List of 3
# $ tpm.rna :'data.frame':  57169 obs. of  5 variables:
#   ..$ wt1      : num [1:57169] 0.2551 0.0344 0.0154 0 0 ...
# ..$ wt2      : num [1:57169] 0.2551 0.0344 0.0154 0 0 ...
# ..$ ko1      : num [1:57169] 0.22734 0 0.00686 0 0 ...
# ..$ ko2      : num [1:57169] 0.1312 0.0354 0 0 0 ...
# ..$ gene_name: chr [1:57169] "A1BG" "A1BG-AS1" "A1CF" "A2M" ...
# $ tpm.ribo:'data.frame':  14647 obs. of  5 variables:
#   ..$ wt1      : num [1:14647] 0.0257 0.6961 50.8271 81.9381 17.6403 ...
# ..$ wt2      : num [1:14647] 0.0257 0.6961 50.8271 81.9381 17.6403 ...
# ..$ ko1      : num [1:14647] 0 1.44 37.4 75.23 17.22 ...
# ..$ ko2      : num [1:14647] 0 1.24 37.61 78.61 8.46 ...
# ..$ gene_name: chr [1:14647] "A1CF" "A4GALT" "AAAS" "AACS" ...
# $ te      :'data.frame':  14647 obs. of  5 variables:
#   ..$ wt1      : num [1:14647] 1.672 0 0.394 8.632 19.743 ...
# ..$ wt2      : num [1:14647] 1.672 0 0.394 8.632 19.743 ...
# ..$ ko1      : num [1:14647] 0 9.397 0.427 9.333 22.229 ...
# ..$ ko2      : num [1:14647] NaN 0 0.554 9.186 27.907 ...
# ..$ gene_name: chr [1:14647] "A1CF" "A2ML1" "A4GALT" "AAAS" ...

Differential analysis

The gene_differential_analysis function uses DESeq2 to perform differential expression analysis separately on RNA or RPF data and returns the corresponding results:

diff.rna <- gene_differential_analysis(obj, 
                                       type = "rna",
                                       control_samples = c("wt1","wt2"),
                                       treat_samples = c("ko1","ko2"))

# check
head(diff.rna)
#            baseMean log2FoldChange    lfcSE       stat    pvalue padj gene_name   type
# A1BG      5.0888259     -0.2345798 1.248392 -0.1879055 0.8509508   NA      A1BG nonSig
# A1BG-AS1  0.7829185     -0.6432696 3.213131 -0.2002002 0.8413240   NA  A1BG-AS1 nonSig
# A1CF      1.1471459     -1.8219647 2.754492 -0.6614522 0.5083223   NA      A1CF nonSig
# A2ML1     1.4697527     -0.7623221 2.236789 -0.3408109 0.7332459   NA     A2ML1 nonSig
# A2ML1-AS1 5.9189416      0.6372937 1.177298  0.5413188 0.5882879   NA A2ML1-AS1 nonSig
# A2ML1-AS2 5.5834917     -0.4017920 1.313643 -0.3058609 0.7597106   NA A2ML1-AS2 nonSig

diff.ribo <- gene_differential_analysis(obj, 
                                        type = "ribo",
                                        control_samples = c("wt1","wt2"),
                                        treat_samples = c("ko1","ko2"))

# check
head(diff.ribo)
#           baseMean log2FoldChange     lfcSE       stat      pvalue      padj gene_name   type
# A1CF     0.5143687     -2.5281257 4.5894024 -0.5508616 0.581728523        NA      A1CF nonSig
# A4GALT  22.3216385      0.9254935 0.6865945  1.3479477 0.177675211 0.6322686    A4GALT nonSig
# AAAS   427.6743989     -0.4642721 0.1766959 -2.6275206 0.008600965 0.1247037      AAAS nonSig
# AACS   609.1352834     -0.1178472 0.1558609 -0.7561051 0.449586198 0.8723837      AACS nonSig
# AADAT  210.8025653     -0.4692803 0.3222716 -1.4561639 0.145347309 0.5805468     AADAT nonSig
# AAGAB  921.2909124     -0.1293514 0.1392711 -0.9287745 0.353005973 0.8165758     AAGAB nonSig

Vocalno plot for differential results:

vocalno_plot(diff_data = diff.rna)

vocalno_plot(diff_data = diff.ribo)

Translation efficiency differential analysis

The TE_differential_analysis function performs a differential analysis of translation efficiency. Users can select either RiboReX or Xtail as their preferred tool for this analysis:

# use riborex for te analysis
te.rex <- TE_differential_analysis(object = obj,
                                   pkg = "riborex",
                                   control_samples = c("wt1","wt2"),
                                   treat_samples = c("ko1","ko2"))

# check
str(te.rex)
# List of 2
# $ deg_raw :Formal class 'DESeqResults' [package "DESeq2"] with 7 slots
# .. ..@ priorInfo      : list()
# .. ..@ rownames       : chr [1:13204] "A4GALT" "AAAS" "AACS" "AADAT" ...
# .. ..@ nrows          : int 13204
# .. ..@ elementType    : chr "ANY"
# .. ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
# .. .. .. ..@ rownames       : NULL
# .. .. .. ..@ nrows          : int 6
# .. .. .. ..@ elementType    : chr "ANY"
# .. .. .. ..@ elementMetadata: NULL
# .. .. .. ..@ metadata       : list()
# .. .. .. ..@ listData       :List of 2
# .. .. .. .. ..$ type       : chr [1:6] "intermediate" "results" "results" "results" ...
# .. .. .. .. ..$ description: chr [1:6] "mean of normalized counts for all samples" "log2 fold change (MLE): EXTRA1 treated vs control" "standard error: EXTRA1 treated vs control" "Wald statistic: EXTRA1 treated vs control" ...
# .. ..@ metadata       :List of 6
# .. .. ..$ filterThreshold: Named num 93
# .. .. .. ..- attr(*, "names")= chr "40.71429%"
# .. .. ..$ filterTheta    : num 0.407
# .. .. ..$ filterNumRej   :'data.frame':   50 obs. of  2 variables:
#   .. .. .. ..$ theta : num [1:50] 0 0.0194 0.0388 0.0582 0.0776 ...
# .. .. .. ..$ numRej: num [1:50] 112 116 116 117 118 118 118 120 121 124 ...
# .. .. ..$ lo.fit         :List of 2
# .. .. .. ..$ x: num [1:50] 0 0.0194 0.0388 0.0582 0.0776 ...
# .. .. .. ..$ y: num [1:50] 114 115 116 117 117 ...
# .. .. ..$ alpha          : num 0.1
# .. .. ..$ lfcThreshold   : num 0
# .. ..@ listData       :List of 6
# .. .. ..$ baseMean      : num [1:13204] 56.7 246.1 341.4 143.7 568 ...
# .. .. ..$ log2FoldChange: num [1:13204] 0.0106 -0.1939 0.0381 -0.1659 0.0415 ...
# .. .. ..$ lfcSE         : num [1:13204] 0.578 0.285 0.241 0.54 0.208 ...
# .. .. ..$ stat          : num [1:13204] 0.0183 -0.681 0.158 -0.3072 0.1996 ...
# .. .. ..$ pvalue        : num [1:13204] 0.985 0.496 0.874 0.759 0.842 ...
# .. .. ..$ padj          : num [1:13204] NA 0.915 0.983 0.968 0.98 ...
# $ deg_anno:'data.frame':  13204 obs. of  8 variables:
#   ..$ baseMean      : num [1:13204] 56.7 246.1 341.4 143.7 568 ...
# ..$ log2FoldChange: num [1:13204] 0.0106 -0.1939 0.0381 -0.1659 0.0415 ...
# ..$ lfcSE         : num [1:13204] 0.578 0.285 0.241 0.54 0.208 ...
# ..$ stat          : num [1:13204] 0.0183 -0.681 0.158 -0.3072 0.1996 ...
# ..$ pvalue        : num [1:13204] 0.985 0.496 0.874 0.759 0.842 ...
# ..$ padj          : num [1:13204] NA 0.915 0.983 0.968 0.98 ...
# ..$ gene_name     : chr [1:13204] "A4GALT" "AAAS" "AACS" "AADAT" ...
# ..$ type          : chr [1:13204] "nonSig" "nonSig" "nonSig" "nonSig" ...
vocalno_plot(diff_data = te.rex$deg_anno)

Xtail for differential analysis:

te.xtail <- TE_differential_analysis(object = obj,
                                     pkg = "xtail",
                                     control_samples = c("wt1","wt2"),
                                     treat_samples = c("ko1","ko2"),
                                     min_count = 4)
# Calculating the library size factors
# 1. Estimate the log2 fold change in mrna
# converting counts to integer mode
# 2. Estimate the log2 fold change in rpf
# converting counts to integer mode
# 3. Estimate the difference between two log2 fold changes
# 4. Estimate the log2 ratio in first condition
# converting counts to integer mode
# 5. Estimate the log2 ratio in second condition
# converting counts to integer mode
# 6. Estimate the difference between two log2 ratios
# Number of the log2FC and log2R used in determining the final p-value
# log2FC: 7530
# log2R: 5484


# check
str(te.xtail)
# List of 2
# $ deg_raw :Formal class 'xtail' [package ""] with 4 slots
# .. ..@ listData       :List of 5
# .. .. ..$ resultsTable        :Formal class 'DFrame' [package "S4Vectors"] with 6 slots
# .. .. .. .. ..@ rownames       : chr [1:13014] "A4GALT" "AAAS" "AACS" "AADAT" ...
# .. .. .. .. ..@ nrows          : int 13014
# .. .. .. .. ..@ elementType    : chr "ANY"
# .. .. .. .. ..@ elementMetadata: NULL
# .. .. .. .. ..@ metadata       : list()
# .. .. .. .. ..@ listData       :List of 11
# .. .. .. .. .. ..$ mRNA_log2FC    : num [1:13014] 0.907 -0.283 -0.172 -0.32 -0.189 ...
# .. .. .. .. .. ..$ RPF_log2FC     : num [1:13014] 0.908 -0.481 -0.134 -0.487 -0.145 ...
# .. .. .. .. .. ..$ log2FC_TE_v1   : num [1:13014] 0.00488 -0.19712 0.03861 -0.16493 0.04371 ...
# .. .. .. .. .. ..$ pvalue_v1      : num [1:13014] 0.938 0.41 0.861 0.632 0.813 ...
# .. .. .. .. .. ..$ control_log2TE : num [1:13014] -2.485 1.119 1.128 0.458 0.722 ...
# .. .. .. .. .. ..$ treated_log2TE : num [1:13014] -2.484 0.921 1.166 0.291 0.765 ...
# .. .. .. .. .. ..$ log2FC_TE_v2   : num [1:13014] 0.0106 -0.1984 0.0378 -0.167 0.0431 ...
# .. .. .. .. .. ..$ pvalue_v2      : num [1:13014] 0.985 0.309 0.823 0.618 0.773 ...
# .. .. .. .. .. ..$ log2FC_TE_final: num [1:13014] 0.0106 -0.1971 0.0386 -0.1649 0.0437 ...
# .. .. .. .. .. ..$ pvalue_final   : num [1:13014] 0.985 0.41 0.861 0.632 0.813 ...
# .. .. .. .. .. ..$ pvalue.adjust  : num [1:13014] 1 1 1 1 1 ...
# .. .. ..$ log2FC_determine_num: int 7530
# .. .. ..$ log2R_determine_num : int 5484
# .. .. ..$ condition1          : chr "control"
# .. .. ..$ condition2          : chr "treated"
# .. ..@ elementType    : chr "ANY"
# .. ..@ elementMetadata: NULL
# .. ..@ metadata       : list()
# $ deg_anno:'data.frame':  13014 obs. of  10 variables:
#   ..$ log2FC_TE_v1   : num [1:13014] 0.00488 -0.19712 0.03861 -0.16493 0.04371 ...
# ..$ pvalue_v1      : num [1:13014] 0.938 0.41 0.861 0.632 0.813 ...
# ..$ treated_log2TE : num [1:13014] -2.484 0.921 1.166 0.291 0.765 ...
# ..$ log2FC_TE_v2   : num [1:13014] 0.0106 -0.1984 0.0378 -0.167 0.0431 ...
# ..$ pvalue_v2      : num [1:13014] 0.985 0.309 0.823 0.618 0.773 ...
# ..$ log2FC_TE_final: num [1:13014] 0.0106 -0.1971 0.0386 -0.1649 0.0437 ...
# ..$ pvalue_final   : num [1:13014] 0.985 0.41 0.861 0.632 0.813 ...
# ..$ pvalue.adjust  : num [1:13014] 1 1 1 1 1 ...
# ..$ type           : chr [1:13014] "nonSig" "nonSig" "nonSig" "nonSig" ...
# ..$ gene_name      : chr [1:13014] "A4GALT" "AAAS" "AACS" "AADAT" ...
  • GitHub - smithlabcode/riborex: Riborex: Fast and flexible identification of differential translation from Ribo-seq data

  • GitHub - xryanglab/xtail: Genome-wide assessment of differential translations with ribosome profiling data