1

I need to find length of overlapped region on same chromosomes between 2 group(gp1 & gp2). (similar question in stackoverflow were different from my aim, because I wanna find overlapped region not a TRUE/FALSE answer).

For example:

gp1: 
chr   start   end   id1 
chr1  580     600    1
chr1  900     970    2
chr3  400     600    3
chr2  100     700    4
gp2:
chr   start   end   id2
chr1  590     864   1
chr3  550     670   2
chr2  897     1987  3

I'm looking for a way to compare these 2 group and get results like this:

id1   id2    chr   overlapped_length
 1     1     chr1     10
 3     2     chr3     50   
Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89
hiam
  • 13
  • 2

1 Answers1

1

Should point you in the right direction:

Load libraries

# install.packages("BiocManager")
# BiocManager::install("GenomicRanges")
library(GenomicRanges)
library(IRanges)

Generate data

gp1 <- read.table(text = 
"
chr   start   end   id1 
chr1  580     600    1
chr1  900     970    2
chr3  400     600    3
chr2  100     700    4
", header = TRUE)

gp2 <- read.table(text = 
                      "
chr   start   end   id2
chr1  590     864   1
chr3  550     670   2
chr2  897     1987  3
", header = TRUE) 

Calculate ranges

gr1 <- GenomicRanges::makeGRangesFromDataFrame(
    gp1, 
    seqnames.field = "chr", 
    start.field = "start", 
    end.field = "end"
)
gr2 <- GenomicRanges::makeGRangesFromDataFrame(
    gp2, 
    seqnames.field = "chr", 
    start.field = "start", 
    end.field = "end"
)

Calculate overlaps

hits <- findOverlaps(gr1, gr2)
p <- Pairs(gr1, gr2, hits = hits)
i <- pintersect(p)

Result

> as.data.frame(i)
  seqnames start end width strand  hit
1     chr1   590 600    11      * TRUE
2     chr3   550 600    51      * TRUE
Roman
  • 4,744
  • 2
  • 16
  • 58
  • As `findOverlaps()` is counting the _adjacent line itself_ as a hit, it will be off by 1. Shouldn't be too hard to hit the goal from here, though. – Roman Nov 13 '21 at 16:01