I really appreciate if you are reading this and taking your precious time to help me with a problem I have.
In R, I would like to sort data from small, continuous bins from one dataframe to the (non-overlapping) bins of irregular size and distribution in another dataframe for all overlapping intervals.
My first dataframe looks like this (The actual dataframes would be hundreds of thousand of lines):
chr bin from to BS_seq_Count
SL4.0ch01 1 1 500 3
SL4.0ch01 2 501 1000 10
SL4.0ch01 3 1001 1500 3
SL4.0ch02 1 1 500 3
SL4.0ch02 2 501 1000 10
SL4.0ch02 3 1001 1500 3
SL4.0ch03 1 1 500 3
SL4.0ch03 2 501 1000 10
SL4.0ch03 3 1001 1500 3
...
And this is the dataframe that I would like to overlap it with and sort into corresponding bins:
chr bin from to
SL4.0ch01 1 200 700
SL4.0ch01 2 800 1300
SL4.0ch02 1 300 400
SL4.0ch03 1 50 600
SL4.0ch03 2 700 800
SL4.0ch03 3 1000 1200
...
And in the end it should somewhat like this (decimal/rounded does not matter that much, but the counts for partial overlap should also be sorted into the bins):
chr bin from to count
SL4.0ch01 1 200 700 5.8
SL4.0ch01 2 800 1350 6.1
SL4.0ch02 1 300 400 0.6
SL4.0ch03 1 50 600 4.7
SL4.0ch03 2 700 800 2
SL4.0ch03 3 1000 1200 1.2
...
I thought of using GenomicRanges with the findOverlaps function, but could not figure out how to get it working correctly in this case.
If anyone has an idea on how to solve this, any help would be greatly appreciated!
Thank you in advance, I wish you a nice weekend and good health!