2

I have a question linked to another post from yesterday: R finding the first value in a data frame that falls within a given threshold.

As per previous post I have data frame with optical density (OD) over time:

time    OD
446     0.0368
446.5   0.0353
447     0.0334
447.5   0.032
448     0.0305
448.5   0.0294
449     0.0281
449.5   0.0264
450     0.0255
450.5   0.0246
451     0.0238
451.5   0.0225
452     0.0211
452.5   0.0199
453     0.0189
453.5   0.0175

I have upper and lower threshold values of OD and I need to locate when these were exceeded in the data.

This code finds when either the upper or lower thresholds were exceeded; for instance here I am looking for when the lower threshold has been exceeded:

library(dplyr)

find_time = function(df, threshold){
  return_value = df %>%
    arrange(time) %>%
    filter(OD < threshold) %>%
    slice(1)
  return(return_value)
}

find_time(data, threshold)

which returns time and OD when lower threshold was exceeded:

  time     OD
  <dbl>  <dbl>
   446 0.0368

However, I need to know when both the upper (0.5033239) and lower (-0.3695971) thresholds are reached, thus I have modified the code to:

find_time = function(df, threshold_1, threshold_2){
  return_value_1 = df %>%
    arrange(time) %>%
    filter(OD > threshold_1) %>%
    slice_(1)

  return_value_2 = df %>%
        arrange(time) %>%
        filter(OD < threshold_2) %>%
        slice_(1)

  return(data.frame(return_value_1, return_value_2))
}

When I run the code I get one of two errors:

Error in data.frame(return_value, return_value_2) : 
  **arguments imply differing number of rows: 1, 0**
Called from: data.frame(return_value, return_value_2)

or the error:

[1] time   OD     time.1 OD.1  
<0 rows> (or 0-length row.names)

These errors seems to be caused by the fact that for some study subjects the OD data never reaches the defined upper/lower threshold.

I need an if statement within the function so that when one of upper or lower thresholds are not found returns "null", but also gives me value of the other (i.e. if upper threshold is not reached then return null but also give time and OD for lower threshold).

I have tried, but clearly I'm doing it horribly wrong:

find_time = function(df, threshold_1, threshold_2){
  return_value_1 = df %>%
    arrange(time) %>%
    filter(OD > threshold_1) %>%
    slice_(1)

  **if(OD > threshold_1){
    print(return_value_1)
  } else {
    print("NULL")
  }**


  return_value_2 = df %>%
    arrange(time) %>%
    filter(OD < threshold_2) %>%
    slice_(1)

  **if(OD < threshold_2){
    print(return_value_2)
  } else {
    print("NULL")
  }**

  return(data.frame(return_value_1, return_value_2))
}

Also tried:

find_time = function(df, threshold_1, threshold_2, OD){
  return_value_1 = df %>%
    arrange(time) %>%
   {(if (OD > threshold_1)
    else filter(OD < threshold_2)} %>%
    slice_(1))


  return(data.frame(return_value_1))
}

But I get:

Error in UseMethod("filter_") : 
  no applicable method for 'filter_' applied to an object of class "logical"
In addition: Warning message:
In if (OD == "") filter(OD > threshold_1) %>% slice_(1) else filter_(OD <  :
  the condition has length > 1 and only the first element will be used
amir.fathi
  • 97
  • 1
  • 13
  • 1
    You can do two logic calculation here in the first filter(OD > threshold_1). For example, if you want threshold_1 threshold_1 & OD < threshold_2). – Wenlong Liu Apr 04 '18 at 13:37
  • This will not work because if 'OD>threshold_1' is not found in the data then the whole code fails. This is why I need to include an 'if.else' statement to then look for 'OD < threshold_2'. I can't make this code work though: 'find_time = function(df, threshold_1, threshold_2, ODf){ return_value_1 = df %>% arrange(time) %>% (if(ODf %>% threshold_1) filter(ODf > threshold_1) else filter(ODf < threshold_2) %>% slice_(1)) return(data.frame(return_value_1)) }' – amir.fathi Apr 04 '18 at 16:07
  • I think you need to specify the requirements of your codes. Maybe some basic background of logic operations would help you here. I can post some codes to help you understand this. – Wenlong Liu Apr 04 '18 at 17:19

1 Answers1

0

I think some basic background about logic and Boolean operators helps in this question. There are some bullets:

  1. "TRUE" and "FALSE"

  2. "And", "Or", and "Not"

  3. If and else are "Or"

For example, if you have a list of numbers:

toy_list = c(1,3,5,66,100)

1. True and false

First question: is number 1 in the list?

> 1 %in% toy_list
[1] TRUE

Second question: is number 100000 in the list?

> 10000 %in% toy_list
[1] FALSE

2. and or not.

First question: are number 1 and 10000 in the list?

> 1 %in% toy_list & 1000 %in% toy_list
[1] FALSE

Second question: are number 1 or 10000 in the list?

> 1 %in% toy_list | 1000 %in% toy_list
[1] TRUE

Third question: is number 1 not in the list?

> !(1 %in% toy_list)
[1] FALSE

3. if, else

First question: if 1 in the list, print true, else print false.

> if (1 %in% toy_list){ print("TRUE")} else {print("FALSE")}
[1] "TRUE

Second question: if 100000 in the list, print true, else print false.

> if (100000 %in% toy_list){ print("TRUE")} else {print("FALSE")}
[1] "FALSE"

Third question: if 1 and 100000 in the list, print true, else print false.

> if (1 %in% toy_list & 100000 %in% toy_list  ){ print("TRUE")} else {print("FALSE")}
[1] "FALSE"

Forth question: if 1 or 100000 in the list, print true, else print false.

> if (1 %in% toy_list | 100000 %in% toy_list  ){ print("TRUE")} else {print("FALSE")}
[1] "TRUE"

4. Go back to your question (finally)

If you want to filter the numbers higher than low threshold and lower than high threshold, what you need to do is:

filter(OD > threshold_low & OD < threshold_high)
Wenlong Liu
  • 444
  • 2
  • 13