Quality assessment

Extracting Alignment Information with generate_summary

Ribosome profiling (Ribo-seq) typically requires detailed analysis of aligned reads — including which genes they map to, their read lengths, alignment positions, and more. These features are critical for downstream analyses such as P-site detection, triplet periodicity evaluation, and ribosome occupancy profiling.

The RiboTrans function generate_summary() extracts comprehensive information about all mapped reads from BAM files, including:

  • The gene or transcript each read aligns to

  • Read length

  • Read start and end positions (in transcriptomic coordinates)

The extracted data is stored in the Slot "summary_info" of the RiboTrans object and forms the foundation for the majority of downstream analyses.

⚙️ If the parameter mapping_type = "genome" is used, the function will automatically convert genome-based coordinates into transcriptomic coordinates before storing the results.

Because the output from generate_summary serves as the primary input for most later steps in the pipeline, running this function is a fundamental and essential part of RiboTrans analysis.

Example:

The following code extracts alignment information from BAM files for each sample:

# generate summary data for QC or other analysis
obj0 <- generate_summary(object = obj0, exp_type = "ribo", nThreads = 20)

head(obj0@summary_info)
#               rname  pos qwidth count  sample sample_group mstart mstop translen
# 1 YAL067C_mRNA|SEO1 1774     27     1 wt-rep1      wt-rep1     51  1829     1882
# 2 YAL067C_mRNA|SEO1 1771     27     1 wt-rep1      wt-rep1     51  1829     1882
# 3 YAL067C_mRNA|SEO1 1766     20     1 wt-rep1      wt-rep1     51  1829     1882
# 4 YAL067C_mRNA|SEO1 1761     27     1 wt-rep1      wt-rep1     51  1829     1882
# 5 YAL067C_mRNA|SEO1 1662     28     1 wt-rep1      wt-rep1     51  1829     1882
# 6 YAL067C_mRNA|SEO1 1632     25     1 wt-rep1      wt-rep1     51  1829     1882

Column Descriptions

  • rname: A compound field combining the transcript ID and gene name, separated by a vertical bar (|). If the gene name is missing in the GTF annotation, only the transcript ID is shown.

  • pos: The position (in transcriptomic coordinates) that the read maps to within the transcript.

  • qwidth: The width (i.e., length) of the read in nucleotides.

  • count: The number of reads that map to this exact position with the same length.

  • sample: The name of the sample from which the read originated.

  • sample_group: The user-defined group the sample belongs to. If no grouping was provided during setup, the sample name is used as the group name by default.

  • mstart / mstop: The start and end positions of the coding sequence (CDS) region on the transcript.

  • translen: The total length of the transcript in nucleotides.

This detailed read-level summary forms the foundation for various downstream analyses — including ribosome occupancy, metagene profiling, and P-site calibration — making it an essential and informative step in the RiboTransVis workflow.

The generate_summary function will save the data locally as tinfo.anno.rda by default. For large datasets, rerunning the analysis may be time-consuming. Therefore, you can set load_local = TRUE to directly load the locally saved data:

obj0 <- generate_summary(object = obj0, 
                         exp_type = "ribo", 
                         load_local = T)