My data.frame looks like this: The col1 defines the start of a range when the direction is " + ", while the col2 establishes the beginning of a range when the direction is " - ".
library(tidyverse)
df <- tibble(organ=c(rep("liver",5), rep("lung",5)),
col1=c(1,10,100,40,1000,1,10,100,40,1000),
col2=c(15,20,50,80,2000,15,20,50,80,2000),
direction=c("+","+","-","+","+","+","+","-","+","+"),
score=c(50,100,300,10,300,50,100,300,10,300))
df
#> # A tibble: 10 × 5
#> organ col1 col2 direction score
#> <chr> <dbl> <dbl> <chr> <dbl>
#> 1 liver 1 15 + 50
#> 2 liver 10 20 + 100
#> 3 liver 100 50 - 300
#> 4 liver 40 80 + 10
#> 5 liver 1000 2000 + 300
#> 6 lung 1 15 + 50
#> 7 lung 10 20 + 100
#> 8 lung 100 50 - 300
#> 9 lung 40 80 + 10
#> 10 lung 1000 2000 + 300
Created on 2022-07-29 by the reprex package (v2.0.1)
For each organ group_by(organ)
,
I want to consider the direction of each row, identify for which rows the ranges are overlapping, and then keep the rows with the highest score.
I want my data to look like this.
#> organ col1 col2 direction score
#> <chr> <dbl> <dbl> <chr> <dbl>
#> 1 liver 10 20 + 100
#> 3 liver 100 50 - 300
#> 5 liver 1000 2000 + 300
#> 7 lung 10 20 + 100
#> 8 lung 100 50 - 300
#> 10 lung 1000 2000 + 300
I have been thinking of this for a long time. Any guidance or help is highly appreciated.