I have a df
like:
SampleID Chr Start End Strand Value
1: rep1 1 11001 12000 - 10
2: rep1 1 15000 20100 - 5
3: rep2 1 11070 12050 - 1
4: rep3 1 14950 20090 + 20
...
And I want to join the rows that share the same chr
and strand
and that have similar starting and end points (say like with 100 +/- distance). For those columns that the row join is performed, I would also like to concatenate the SampleID
names and the Value
. With the previous example, something like:
SampleID Chr Start End Strand Value
1:rep1,rep2 1 11001 12000 - 10,1
2: rep1 1 15000 20100 - 5
4: rep3 1 14950 20090 + 20
...
Ideas? Thanks!
EDIT:
I found the fuzzyjoin package for R (https://cran.r-project.org/web/packages/fuzzyjoin/index.html). Does anyone have experience with this package?
EDIT2:
It would be also nice if just one of the variables (SampleID
or Value
) would be concatenated.