0

I have a wig file that I've read into a granges-like object using a function that I wrote that calls the rtracklayer package:

read_wig <- function(x, format='wig', genome='mm9') {
  suppressMessages(library(rtracklayer))  
  merged_wig <- import.wig(x, format=format, genome=genome)
  merged_wig <- keepSeqlevels(merged_wig, paste0('chr', c(seq(1,19), 'X', 'Y')), pruning.mode="coarse")
  return(merged_wig)
}

wig <- read_wig('~/path/to/wig')

The code above returns:

> wig
UCSC track 'MEFES_K27AC.downsampled.sorted'
UCSCData object with 13274466 ranges and 1 metadata column:
             seqnames               ranges strand |     score
                <Rle>            <IRanges>  <Rle> | <numeric>
         [1]     chr1          [  1,  200]      * |         1
         [2]     chr1          [201,  400]      * |         2
         [3]     chr1          [401,  600]      * |         3
         [4]     chr1          [601,  800]      * |         4
         [5]     chr1          [801, 1000]      * |         0
         ...      ...                  ...    ... .       ...
  [13274462]     chrY [15901401, 15901600]      * |         0
  [13274463]     chrY [15901601, 15901800]      * |         0
  [13274464]     chrY [15901801, 15902000]      * |         0
  [13274465]     chrY [15902001, 15902200]      * |         0
  [13274466]     chrY [15902201, 15902400]      * |         0
  -------
  seqinfo: 21 sequences from mm9 genome

Now with this object I'd like to compute the sum of scores within a window around each range for each row in the object. For example, I'd like to compute the sum of score between ranges 1-10000 (123 for this example) and add this entry as a column next to score. I'd like to do this for each row.

> expected_output
UCSC track 'MEFES_K27AC.downsampled.sorted'
UCSCData object with 13274466 ranges and 1 metadata column:
             seqnames               ranges strand |     score  score_10000
                <Rle>            <IRanges>  <Rle> | <numeric>    <numeric>
         [1]     chr1          [  1,  200]      * |         1          123
         [2]     chr1          [201,  400]      * |         2          ...
         [3]     chr1          [401,  600]      * |         3          ...
         [4]     chr1          [601,  800]      * |         4          ...
         [5]     chr1          [801, 1000]      * |         0          ...
         ...      ...                  ...    ... .       ...
  [13274462]     chrY [15901401, 15901600]      * |         0          ...
  [13274463]     chrY [15901601, 15901800]      * |         0          ...
  [13274464]     chrY [15901801, 15902000]      * |         0          ...
  [13274465]     chrY [15902001, 15902200]      * |         0          ...
  [13274466]     chrY [15902201, 15902400]      * |         0          ...
  -------
  seqinfo: 21 sequences from mm9 genome

Ideally I'd like to add columns that compute score ranges from 1-10000, 1-20000, 1-30000, etc. up to 100000.

Any help would be much appreciated!

EDIT:

Wig file can be found here.

user2117258
  • 515
  • 4
  • 18

0 Answers0