Extracts count matrices around candidate binding sites

Extracts counts around candidate binding sites on both strands from the genome counts data (BigWig files generated using count_genome_cuts()). It utilizes the extract bed function from the bwtool software to extract the read counts, then combines the counts into one matrix, with the first half of the columns representing the read counts on the forward strand, and the second half of the columns representing the read counts on the reverse strand.

get_sites_counts(
  sites,
  genomecount_dir,
  genomecount_name,
  tmpdir = genomecount_dir,
  bedGraphToBigWig_path = "bedGraphToBigWig",
  bwtool_path = "bwtool"
)

Arguments

sites: A data frame containing the candidate sites.
genomecount_dir: Directory for genome counts, the same as outdir in count_genome_cuts().
genomecount_name: File prefix for genome counts, the same as outname in count_genome_cuts().
tmpdir: Temporary directory to save intermediate files.
bedGraphToBigWig_path: Path to UCSC bedGraphToBigWig executable.
bwtool_path: Path to bwtool executable.

Value

A count matrix. The first half of the columns are the read counts on the forward strand, and the second half of the columns are the read counts on the reverse strand.

Examples

if (FALSE) {
# Extracts ATAC-seq count matrices around candidate sites
count_matrix <- get_sites_counts(sites,
                                 genomecount_dir='processed_data',
                                 genomecount_name='K562.ATAC')
}