Not new to R, but I'm new to more advanced R techniques and I've run into an issue. I have a somewhat large dataset I'm working with (not honking big, but about 65000 rows of data total incorporating 18 trials). Link here: https://www.dropbox.com/s/qn6fldj9z6w21b2/wtvstyr%20%282%29.csv?dl=0, and I've been working with it as a dataframe. Here is the task at hand:
I need to conditionally replace velocity values based on information from the direction and Y columns on a trial by trial basis. Here are my conditions: if direction is TRUE and the first 5 values of Y are <20, I need to replace all velocity values for Trial x with NA. If direction is TRUE and the first 5 values of Y are not <20, then I only need to do it on a case-by-case basis. If direction is FALSE and the first 5 values of Y are >180, I need to replace all velocity values for Trial x with NA. If direction is FALSE and the first 5 values of Y are not >180, then I only need to do it on a case-by-case basis.
I have the following code using dplyr from a few solutions that I've found on here (mainly from dplyr replacing na values in a column based on multiple conditions):
wtvstyr <- wtvstyr %>%
mutate(velocity = case_when(direction == TRUE & Y<20 ~ NA_real_, TRUE ~ velocity))
wtvstyr <- wtvstyr %>%
mutate(velocity = case_when(direction == FALSE & Y>180 ~ NA_real_, TRUE ~ velocity))
Which solves my problem on the case-by-case basis. As for discarding entire trials, I am rather stumped. I tried to do it with ifelse wrapped in a dplyr pipeline with an index for the first value, but I must confess I have no idea what I'm doing. Here is that bit of code for the TRUE/<20 conditional along these lines: Using If/Else on a data frame:
wtvstyr %>%
group_by(Trial) %>%
ifelse(case_when(direction == TRUE & Y[1]<20), velocity, NA_real_)
When I tried that, however, I got an unused argument error for NA.
Any help would be appreciated! And if there's a better way to do this entirely (re, masking values or some other way I don't know), any guidance would be fantastic. Thanks!
EDIT
Here is a reproducible mini-example of my dataset:
require(tidyverse)
set.seed(80)
Trial <- c(rep(1, 40), rep(2, 40))
Y <- c(sample(0:200, 80, replace=TRUE))
Time <- c(1:80)
Direction1 <- c(rep("TRUE", 10), rep("FALSE", 10))
Direction <- c(rep(Direction1, 4))
example <- data.frame(Trial, Time, Y, Direction)
example$Y2 = example$Y
shift <- function(x, n){
c(x[-(seq(n))], rep(NA, n))
}
example$Y2 <- shift(example$Y2, 1)
example$velocity <- as.numeric(example$Y2) - as.numeric(example$Y)
example <- example[-c(5)]
#bit of code to remove velocities when they meet conditions I don't want:
example <- example %>%
mutate(velocity = case_when(Direction == TRUE & Y<20 ~ NA_real_, TRUE ~ velocity))
example <- example %>%
mutate(velocity = case_when(Direction == FALSE & Y>180 ~ NA_real_, TRUE ~ velocity))
With that second bit of code I can remove my case-by-case values (I hope this example clarifies what I mean). I'm still having trouble coding some kind of way to identify based on the first five values in Y which trials need to be discarded entirely.
So for example, in the first subsection of data where Trial==1 and Direction==TRUE, if any of the first five points of data within that subsection are <20, I need to discard all values in that section while Direction==TRUE. In my original dataset, Direction==TRUE and Direction==FALSE repeat a number of times. I need to treat each case separately.
In my set.seed that I have, the first five Y values under Trial==1 and Direction==TRUE are 138, 40, 32, 192 and 99. Here, because no values are <20 I want to keep that trial and simply remove any values thereafter that meet those conditions (as done by the code above). However, when Trial==1 and Direction==FALSE, my values are 34, 187, 53, 79 and 8. Because 187>180, I need to remove all the values corresponding to Trial==1 and Direction==FALSE. However, later on, there is another case where Trial=1 and Direction==FALSE. I want to keep that case separately and evaluate it based on the first five values. If I need to attach another column numbering what repetition of direction I'm on to keep them separated, I can do that.
Let me know if you need any more clarification and again, thank you for any help you can give.