0

I am looking for a way to split my data into groups where each group is made of the same window size I define.

      Chrom     Start   End        
      chr1       1    10      
      chr1       11   20      
      chr1       21   30      
      chr1       31   40 

For example, if I want a window size of 20, then the groups would be : 1-20 , 11-30 , 21 - 40.
As long as the size of the group did not exceed 20 it can keep adding to the same group.

I tried using the split function but couldn't implement this way using it. Is there a way around this?

user12
  • 71
  • 2
  • 2
  • 10

2 Answers2

0

A vector (or column of a dataframe) can be split into overlapping windows like this:

# Size of overlap
o <- 10
# Size of sliding window
n <- 20
# Dummy data
x <- sample(LETTERS, size = 40, replace = T)

# Define start and end point (s and e)
s <- 1
e <- n

# Loop to create fragments
for(i in 1:(length(x)/o)){

  assign(paste0("x", i), x[s:e])
  s <- s + o
  e <- (s + n) - 1

  }

# Call fragments  
x1
x2
x3

Result:

> x
 [1] "F" "E" "G" "X" "R" "S" "L" "F" "F" "C" "I" "X" "A" "C" "B" "Z" "Q" "T" "W" "L" "G" "I" "B" "I" "O" "V" "J" "Z" "C" "R" "W" "Z" "F" "T" "N" "U" "F" "R" "A" "V"
> x1
 [1] "F" "E" "G" "X" "R" "S" "L" "F" "F" "C" "I" "X" "A" "C" "B" "Z" "Q" "T" "W" "L"
> x2
 [1] "I" "X" "A" "C" "B" "Z" "Q" "T" "W" "L" "G" "I" "B" "I" "O" "V" "J" "Z" "C" "R"
rg255
  • 4,119
  • 3
  • 22
  • 40
0
library(IRanges)
library(GenomicRanges)

(gr1 <- GRanges("chr1",IRanges(c(1,11,21,31),width=10),strand="*"))
(gr2 <- GRanges("chr1",IRanges(c(1,11,21),width=20),strand="*"))


fo <- findOverlaps(gr1, gr2)
queryHits(fo)
subjectHits(fo)

Check http://genomicsclass.github.io/book/pages/bioc1_igranges.html#intrarange for more details.

A. Suliman
  • 12,923
  • 5
  • 24
  • 37
  • Thank you, this is a fast method. Is there a way to control the the distance by which findOverlaps will consider them overlapping? I was reading the manual but didn't find a parameter that could adjust this. – user12 May 09 '18 at 08:43
  • Check [here](https://www.rdocumentation.org/packages/GenomicRanges/versions/1.24.1/topics/findOverlaps-methods) and [here](https://www.rdocumentation.org/packages/IRanges/versions/2.0.1/topics/findOverlaps-methods), I think it can manged by `maxgap, minoverlap` – A. Suliman May 09 '18 at 10:20