2

I have a bit of code that I used in an excel spreadsheet that used min and max that I'm trying to transfer over to R. I have two columns, "mini" and "maxi" which represent a range of possible values. The third column I'm trying to populate is the proportion of that range that falls between 5 and 19. Looking at the first row in the example, if "mini" was 10 and "maxi" was 15, the value of the 5-19 column should be 1, since the range falls completely in that span. In row 9, the "mini" is 1 and the "maxi" is 3, meaning it falls completely outside of the 5-19 range, and should therefore be 0. Row 3 however, straddles this range, and only 25% falls in the range of 5-19, so the output value should be 0.25.

Edit I have updated R and although several solutions worked before, I am now getting the error:

Error in mutate_impl(.data, dots, caller_env()) : 
  attempt to bind a variable to R_UnboundValue

Here's an example of how the DF looks:

ID  mini maxi  
1   10  15
2   17  20
3   2   5
4   40  59
5   40  59
6   21  39
7   21  39
8   17  20
9   1   3
10  4   6

The code that I used previously was something like this:

=MAX((MIN(maxi,19)-MAX(mini,5)+1),0)/(maxi-mini+1)

I was initially trying to use something like

percentoutput <- mutate(DF, output = MAX((MIN(maxi,19) - MAX(mini,5) + 1),0)/(maxi-mini + 1))

This resulted in the ouput column being full of NAs. I wasn't sure if this is a situation where I'd need to run an apply function, but I'm not sure how to go about setting it up. Any guidance is appreciated!

Here is an example DF:

structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), min = c(10, 
17, 2, 40, 40, 21, 21, 17, 1, 4), max = c(15, 20, 5, 59, 59, 
39, 39, 20, 3, 6)), class = c("spec_tbl_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -10L), spec = structure(list(
    cols = list(ID = structure(list(), class = c("collector_double", 
    "collector")), mini = structure(list(), class = c("collector_double", 
    "collector")), maxi = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))
Corey
  • 405
  • 2
  • 6
  • 18

3 Answers3

1

We can calculate ratio of min to max values that are in range of 5:19 using rowwise.

library(dplyr)
df %>% rowwise() %>% mutate(ratio = mean(min:max %in% 5:19))

#      ID   min   max ratio
#   <dbl> <dbl> <dbl> <dbl>
# 1     1    10    15 1    
# 2     2    17    20 0.75 
# 3     3     2     5 0.25 
# 4     4    40    59 0    
# 5     5    40    59 0    
# 6     6    21    39 0    
# 7     7    21    39 0    
# 8     8    17    20 0.75 
# 9     9     1     3 0    
#10    10     4     6 0.667

and similarly in base R using apply :

df$ratio <- apply(df[-1], 1, function(x) mean(x[1]:x[2] %in% 5:19))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Much cleaner answer than I was initially working towards, thanks! I'm running into an issue in the real data set with an "min:max : NA/NaN argument". Checked the columns for zeros and NA values, but the data looks ok. Any ideas? Could you explain what "mean(min:max %in% 5:19)" is doing in the dplyr solution? I might be missing something here.... – Corey Apr 23 '20 at 06:16
  • 1
    `min:max` and `5:19` generates sequences between those 2 variables. Using `%in%` we check how many number between min and max lie in between 5 and 19, using `mean` we just take the ratio. Based on error message I think you might have `NA` values somewhere because `NA:4` gives the same error message. – Ronak Shah Apr 23 '20 at 06:37
  • 1
    Thanks for the explanation. I just figured out the NA/NaN issue as well. I filtered out NA values but forgot to check for blanks. It's working perfectly now. – Corey Apr 23 '20 at 06:52
  • I have updated R and I'm now getting the error "Error in mutate_impl(.data, dots, caller_env()) : attempt to bind a variable to R_UnboundValue". I tried using the map2 solution as well and got the same error. Do you have any idea as to what might be going on? – Corey Jun 03 '20 at 03:01
  • That's strange. I am on latest R 4.0.0 and both the answer still work for me the same way. – Ronak Shah Jun 03 '20 at 03:03
  • It turned out it was an issue with an older version of dplyr. It's functioning properly now. – Corey Jun 04 '20 at 06:07
1

Here is a vectorized version using data.table:

DT[, portion := {
    mn <- pmax(mini, lb)
    mx <- pmin(maxi, ub)
    fifelse(mn <= mx, (mx - mn + 1L) / (maxi - mini + 1L), 0)
}]

Or equivalently in base R:

DF$mn <- pmax(DF$mini, lb)
DF$mx <- pmin(DF$maxi, ub)
DF$portion <- ifelse(DF$mn <= DF$mx, (DF$mx - DF$mn + 1L) / (DF$maxi - DF$mini + 1L), 0)

output:

    ID mini maxi   portion
 1:  1   10   15 1.0000000
 2:  2   17   20 0.7500000
 3:  3    2    5 0.2500000
 4:  4   40   59 0.0000000
 5:  5   40   59 0.0000000
 6:  6   21   39 0.0000000
 7:  7   21   39 0.0000000
 8:  8   17   20 0.7500000
 9:  9    1    3 0.0000000
10: 10    4    6 0.6666667

data:

library(data.table)
DT <- fread("ID  mini maxi  
1   10  15
2   17  20
3   2   5
4   40  59
5   40  59
6   21  39
7   21  39
8   17  20
9   1   3
10  4   6")
lb <- 5L
ub <- 19L
chinsoon12
  • 25,005
  • 4
  • 25
  • 35
1

We can use map2

library(dplyr)
library(purrr)  
df %>% 
      mutate(ratio = map2_dbl(min, max, ~ mean(.x:.y %in% 5:19)))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I have updated R and I'm now getting the error "Error in mutate_impl(.data, dots, caller_env()) : attempt to bind a variable to R_UnboundValue". I got the error from the rowise/mutate solution as well as the map2 solution you provided and got the same error. Do you have any idea as to what might be going on? – Corey Jun 03 '20 at 03:03