# generate summary data for QC or other analysis
obj0 <- generate_summary(object = obj0, exp_type = "ribo", nThreads = 20)
head(obj0@summary_info)
# rname pos qwidth count sample sample_group mstart mstop translen
# 1 YAL067C_mRNA|SEO1 1774 27 1 wt-rep1 wt-rep1 51 1829 1882
# 2 YAL067C_mRNA|SEO1 1771 27 1 wt-rep1 wt-rep1 51 1829 1882
# 3 YAL067C_mRNA|SEO1 1766 20 1 wt-rep1 wt-rep1 51 1829 1882
# 4 YAL067C_mRNA|SEO1 1761 27 1 wt-rep1 wt-rep1 51 1829 1882
# 5 YAL067C_mRNA|SEO1 1662 28 1 wt-rep1 wt-rep1 51 1829 1882
# 6 YAL067C_mRNA|SEO1 1632 25 1 wt-rep1 wt-rep1 51 1829 1882
Quality assessment
Extracting Alignment Information with generate_summary
Ribosome profiling (Ribo-seq) typically requires detailed analysis of aligned reads — including which genes they map to, their read lengths, alignment positions, and more. These features are critical for downstream analyses such as P-site detection, triplet periodicity evaluation, and ribosome occupancy profiling.
The RiboTrans function generate_summary()
extracts comprehensive information about all mapped reads from BAM files, including:
The gene or transcript each read aligns to
Read length
Read start and end positions (in transcriptomic coordinates)
…
The extracted data is stored in the Slot "summary_info"
of the RiboTrans object and forms the foundation for the majority of downstream analyses.
⚙️ If the parameter mapping_type = "genome"
is used, the function will automatically convert genome-based coordinates into transcriptomic coordinates before storing the results.
Because the output from generate_summary
serves as the primary input for most later steps in the pipeline, running this function is a fundamental and essential part of RiboTrans analysis.
Example:
The following code extracts alignment information from BAM files for each sample:
Column Descriptions
rname: A compound field combining the transcript ID and gene name, separated by a vertical bar (
|
). If the gene name is missing in the GTF annotation, only the transcript ID is shown.pos: The position (in transcriptomic coordinates) that the read maps to within the transcript.
qwidth: The width (i.e., length) of the read in nucleotides.
count: The number of reads that map to this exact position with the same length.
sample: The name of the sample from which the read originated.
sample_group: The user-defined group the sample belongs to. If no grouping was provided during setup, the sample name is used as the group name by default.
mstart / mstop: The start and end positions of the coding sequence (CDS) region on the transcript.
translen: The total length of the transcript in nucleotides.
This detailed read-level summary forms the foundation for various downstream analyses — including ribosome occupancy, metagene profiling, and P-site calibration — making it an essential and informative step in the RiboTransVis workflow.
The generate_summary
function will save the data locally as tinfo.anno.rda
by default. For large datasets, rerunning the analysis may be time-consuming. Therefore, you can set load_local = TRUE
to directly load the locally saved data:
obj0 <- generate_summary(object = obj0,
exp_type = "ribo",
load_local = T)