Chapter 4 Track data input

There are many tools or softwares to visualize NGS(Next Generation Sequencing) data including ChIP-SEQ, ATAC-SEQ, RNA-SEQ, Hic, HiChIP and so on. The R packages Gviz, plotgardener, ggcoverage and ggbio are some popular tools to visualize NGS data in R. Besides, the online or local softwares like IGV, Wubrowse and ucsc genome browser. There are still some limitations for some of them to make a High-Quality Graphics for Publication with less code and less time. Here I supply some utilities and functions totally based on ggplot2 package to visualize NGS data in R and create a nice graph with high-quality for publication.

Though I have written a similar package transPlotR which plays some same roles on generating tracks plot. Some limitations and shortcomings come to me when I use this package. So I devote myself again to develop and expand the functions.

The main plot function is trackVisProMax function and this is combined with some data load input functions for create a track graph.

4.1 Load signal data

bigwig format is binary of wig which’s data size is samller. I recommend use this format, “wig” and “bedGraph” format are also accepted, you should just define format parameter. loadBigWig function can read bigwig data into R which based on rtracklayer::import.bw. Here are some examples to load your bigwig data:

library(BioSeqUtils)

# load bigwig files
file <- list.files(path = "test-bw/",pattern = '.bw',full.names = T)
file
# [1] "test-bw/1cell-m6A-1.bw" "test-bw/1cell-m6A-2.bw" "test-bw/1cell-RNA-1.bw"
# [4] "test-bw/1cell-RNA-2.bw" "test-bw/2cell-m6A-1.bw" "test-bw/2cell-m6A-2.bw"
# [7] "test-bw/2cell-RNA-1.bw" "test-bw/2cell-RNA-2.bw"

# select some chromosomes for test
bw <- loadBigWig(bw_file = file,chrom = c("5","15"),format = "bw")

# check
head(bw,3)
#   seqnames   start     end   score    fileName
# 1       15       1 3054635 0.00000 1cell-m6A-1
# 2       15 3054636 3054640 1.34079 1cell-m6A-1
# 3       15 3054641 3054715 2.68159 1cell-m6A-1

For saving space, we selected chromosome 5 and chromosome 15 for each file, if you don’t specify chrom parameter, loadBigWig will return all chromosomes. You can also specify file_name to assign a new name for your each bigwig data.

4.2 Load peaks data

loadBed function allows you to read peaks data in R which ias baesd on rtracklayer::import.bed. Usually we will select the first three columns for downstream analysis. You can also specify file_name to assign a new name for your each bed data. Here is a example to read peaks data:

bedfile <- list.files(path = "./",pattern = ".bed")
# [1] "peaks.bed"  "peaks2.bed"

bed_df <- loadBed(bedfile)

# check
head(bed_df,3)
#   seqnames     start       end sampleName y
# 1        5 142905501 142905600      peaks 1
# 2        5 142903201 142903800      peaks 1
# 3       15  61985342  61985900      peaks 1

4.5 Extract junction data

loadJunction can be used to load junctions data from your own bed format data which records differential splice sites information from other tools identified or from your bam file. The latter we use megadepth::bam_to_junctions to extract all junctions data and return a data frame format. More details see megadepth. Here we show examples:

bam_file <- list.files(path = "F:/junc-test/",
                       pattern = ".bam$",full.names = T)
bam_file
# [1] "F:/junc-test/C1.sorted.bam" "F:/junc-test/WT.sorted.bam"

junc_df <- loadJunction(data_path = bam_file,
                        file_name = c("C1","WT"))
junc_df <- junc_df %>% dplyr::filter(score >= 5)

# check
head(junc_df,3)
#   seqnames   start     end score fileName
# 1        1 3154117 3159706     1       C1
# 2        1 3207318 3213608     1       C1
# 3        1 4492669 4493099     8       C1

The score stands for read count of junctions. A little time you will spend if you extarct all junctions site from bam files. I recommend you featch the significant junction sites information from other softwares for visualization.