Chapter 4 Track data input
There are many tools or softwares to visualize NGS(Next Generation Sequencing) data including ChIP-SEQ, ATAC-SEQ, RNA-SEQ, Hic, HiChIP and so on. The R packages Gviz, plotgardener, ggcoverage and ggbio are some popular tools to visualize NGS data in R. Besides, the online or local softwares like IGV, Wubrowse and ucsc genome browser. There are still some limitations for some of them to make a High-Quality Graphics for Publication with less code and less time. Here I supply some utilities and functions totally based on ggplot2 package to visualize NGS data in R and create a nice graph with high-quality for publication.
Though I have written a similar package transPlotR which plays some same roles on generating tracks plot. Some limitations and shortcomings come to me when I use this package. So I devote myself again to develop and expand the functions.
The main plot function is trackVisProMax function and this is combined with some data load input functions for create a track graph.
4.1 Load signal data
bigwig format is binary of wig which’s data size is samller. I recommend use this format, “wig” and “bedGraph” format are also accepted, you should just define format parameter. loadBigWig function can read bigwig data into R which based on rtracklayer::import.bw. Here are some examples to load your bigwig data:
library(BioSeqUtils)
# load bigwig files
file <- list.files(path = "test-bw/",pattern = '.bw',full.names = T)
file
# [1] "test-bw/1cell-m6A-1.bw" "test-bw/1cell-m6A-2.bw" "test-bw/1cell-RNA-1.bw"
# [4] "test-bw/1cell-RNA-2.bw" "test-bw/2cell-m6A-1.bw" "test-bw/2cell-m6A-2.bw"
# [7] "test-bw/2cell-RNA-1.bw" "test-bw/2cell-RNA-2.bw"
# select some chromosomes for test
bw <- loadBigWig(bw_file = file,chrom = c("5","15"),format = "bw")
# check
head(bw,3)
# seqnames start end score fileName
# 1 15 1 3054635 0.00000 1cell-m6A-1
# 2 15 3054636 3054640 1.34079 1cell-m6A-1
# 3 15 3054641 3054715 2.68159 1cell-m6A-1
For saving space, we selected chromosome 5 and chromosome 15 for each file, if you don’t specify chrom parameter, loadBigWig will return all chromosomes. You can also specify file_name to assign a new name for your each bigwig data.
4.2 Load peaks data
loadBed function allows you to read peaks data in R which ias baesd on rtracklayer::import.bed. Usually we will select the first three columns for downstream analysis. You can also specify file_name to assign a new name for your each bed data. Here is a example to read peaks data:
4.3 Load links data
Links data often describe two interaction sites on genomic positions. Hic and HiChIP technologies can achive this goal. The data format can be bed and bedpe format. Or you can supply with only 4 columns(chrom, start, end, value) format data. loadloops function allows you to read these format data in R. Example shows in the following code:
Note: You should supply file_name for each file.
loop_file <- list.files("test-bw2/",pattern = ".bedpe$",full.names = T)
loop_file
# [1] "test-bw2/C1-CTCF.bedpe" "test-bw2/C1-H3K27ac.bedpe" "test-bw2/M1-CTCF.bedpe"
# [4] "test-bw2/M1-H3K27ac.bedpe"
file_name = c("C1-CTCF","C1-H3K27ac","M1-CTCF","M1-H3K27ac")
# test code
loop_data <- loadloops(loop_file = loop_file,file_name = file_name,
sep = " ")
# check
head(loop_data,3)
# seqnames start end score fileName
# 1 chr10 100002774 100022436 0.021354 C1-CTCF
# 2 chr10 100002774 100069170 0.068404 C1-CTCF
# 3 chr10 100002774 100185646 0.184670 C1-CTCF
4.5 Extract junction data
loadJunction can be used to load junctions data from your own bed format data which records differential splice sites information from other tools identified or from your bam file. The latter we use megadepth::bam_to_junctions to extract all junctions data and return a data frame format. More details see megadepth. Here we show examples:
bam_file <- list.files(path = "F:/junc-test/",
pattern = ".bam$",full.names = T)
bam_file
# [1] "F:/junc-test/C1.sorted.bam" "F:/junc-test/WT.sorted.bam"
junc_df <- loadJunction(data_path = bam_file,
file_name = c("C1","WT"))
junc_df <- junc_df %>% dplyr::filter(score >= 5)
# check
head(junc_df,3)
# seqnames start end score fileName
# 1 1 3154117 3159706 1 C1
# 2 1 3207318 3213608 1 C1
# 3 1 4492669 4493099 8 C1
The score stands for read count of junctions. A little time you will spend if you extarct all junctions site from bam files. I recommend you featch the significant junction sites information from other softwares for visualization.