4

I have a data frame

  stim1 stim2 Chosen Rejected
1:     2     1      2        1
2:     3     2      2        3
3:     3     1      1        3
4:     2     3      3        2
5:     1     3      1        3

My objective is at each trial to add a column that specifies whether the stimulus was most recently (in previous trials) Chosen or Rejected.

desired outcome

  stim1 stim2 Chosen Rejected     Previous_stim1   Previous_stim2
1:     2     1      2        1        NaN              NaN
2:     3     2      2        3        NaN              Chosen
3:     3     1      1        3        Rejected         Rejected
4:     2     3      3        2        Chosen           Rejected
5:     1     3      1        3        Chosen           Chosen

any help will be greatly appreciated!


UPDATE

TarJae had a really helpful suggestion that helped categorize the piece of the dataframe i shared correctly. I didn't mention that it's really part of a larger data frame and for some reason fairly quickly this method stops classifying correctly

   stim1 stim2 Chosen Rejected Previous_stim1 Previous_stim2
 1:     2     1      2        1           <NA>           <NA>
 2:     3     2      2        3           <NA>         Chosen
 3:     3     1      1        3       Rejected       Rejected
 4:     2     3      3        2         Chosen       Rejected
 5:     1     3      1        3         Chosen         Chosen
 6:     2     1      1        2         Chosen         Chosen
 7:     2     3      2        3         Chosen         Chosen
 8:     3     1      1        3         Chosen         Chosen
 9:     2     1      2        1         Chosen         Chosen

For example, in row 6 stim1==2. Most recently, 2 was rejected (row 4) but the method classified it as chosen.

Any ideas what this happens?

Thank you again for everyones help.


Update 2

Thank you so much for your help. But say I have also a column with the "outcome".

   stim1 stim2 Chosen Rejected outcome Previous_stim 1 Previous_stim 2
1:    15    13     15       13       1            <NA>            <NA>
2:    13    14     14       13       1        Rejected            <NA>
3:    14    15     14       15       1          Chosen          Chosen
4:    14    13     14       13       0          Chosen        Rejected
5:    13    15     13       15       0        Rejected        Rejected
6:    14    15     14       15       1          Chosen        Rejected
7:    15    13     15       13       1        Rejected          Chosen
8:    14    15     14       15       0          Chosen          Chosen

 I want to encode whether it was 
1) most recently  chosen and outcome=1 (can be coded as 1)
2) most recently chosen and outcome=0 (can be coded as 2)
3) most recently rejected and outcome=1 (can be coded as 3)
4) most recently rejected and outcome=0 (can be coded as 4)

is there an easy way to modify the code to make that happen?

Desired output

  stim1 stim2 Chosen Rejected outcome Previous_stim 1 Previous_stim 2 Left_type right_type
1     2     3      2        3       1            <NA>            <NA>       NaN        NaN
2     1     3      3        1       1            <NA>        Rejected       NaN          3
3     2     1      1        2       1          Chosen        Rejected         1          3
4     1     2      1        2       0          Chosen        Rejected         1          3
5     3     1      3        1       1          Chosen          Chosen         1          2

LAST FOLLOW UP

Finally, I would like to add a column checking whether the chosen stimulus in that previous trial (which I am referencing as the most recent rejected trial for the stim in question) is the same as my current alternative stimulus

For example if have

  stim1 stim2 Chosen Rejected     Previous_stim1   Previous_stim2
1:     2     1      2        1        NaN              NaN
2:     3     2      2        3        NaN              Chosen
3:     3     1      1        3        Rejected         Rejected
4:     2     3      3        2        Chosen           Rejected
5:     1     3      1        3        Chosen           Chosen

And here is how I would update my table

in trial 3, previous_stim1 (i.e 3) was previously rejectedin favor of 2 (from trial 2) and not in favor of 1 (which is the current alternative) and so Current_alternative_left=0. 

 Similarly, previous_stim2 (i.e 1)was previously 
rejected but that was rejected in favor of 2 (from trial 1) 
and so current_alternative_right=0
    
    On the other hand, in trial 4 stim1=2 
was previously chosen relative to the same 

stimulus as its currently being pitted against (3) and so current_alternative_right=1

Desired Output

stim1 stim2 Chosen Rejected outcome Previous_stim 1 Previous_stim 2 Left_type right_type
1     2     3      2        3       1            <NA>            <NA>       NaN        NaN
2     1     3      3        1       1            <NA>        Rejected       NaN          3
3     2     1      1        2       1          Chosen        Rejected         1          3
4     1     2      1        2       0          Chosen        Rejected         1          3
5     3     1      3        1       1          Chosen          Chosen         1          2

Current_alternative_left    Current_alternative_right
NaN                           NaN
NaN                           0
0                             0 
1                             0
1                             0     

i am new to data.table but i tried to copy ThomasisCoding function to return this as well with

 h <- function(stim, cr) {
            stim_chosen <- rep(NA,length(stim))
            for (k in seq_along(stim)[-1]) {
                  
                  ind <- which(cr[1:(k - 1), , drop = FALSE] == stim[k], arr.ind = TRUE)
                  if (length(ind)) {
                        stim_chosen[k] <- stim[tail(ind,1)[,"row"]]
                        
                  }
            }
            stim_chosen 
      }


setDT(df)[  ,
                      paste0("Chosen_Last", 1:2) := lapply(
                            .(stim1, stim2),
                            h,
                            cr = cbind(Chosen,Rejected)
                      )
                      ]

though this is not quite giving me the correct answer. Anyone know where i am going wrong?

user15791858
  • 175
  • 5

3 Answers3

2

Here is a solution that was generated with the help of ThomasIsCoding Check if value of column A is present in the same row or previous rows of column B:Here are also additional answers which are adequate for your solution! You could change and adapt which one fits for you. I chose the first one provided by ThomasIsCoding.

The main task was to check the value in all previous rows of an other column

library(dplyr)
df %>% 
    mutate(x = replace(rep(NA, length(Chosen)), match(stim1, lag(Chosen)) <= seq_along(stim1), "Chosen"),
           y = replace(rep(NA, length(Rejected)), match(stim1, lag(Rejected)) <= seq_along(stim1), "Rejected"),
           a = replace(rep(NA, length(Chosen)), match(stim2, lag(Chosen)) <= seq_along(stim2), "Chosen"),
           b = replace(rep(NA, length(Rejected)), match(stim2, lag(Rejected)) <= seq_along(stim2), "Rejected"),
           Previous_stim1 = coalesce(x, y),
           Previous_stim2 = coalesce(a, b)) %>% 
    select(stim1, stim2, Chosen, Rejected, Previous_stim1, Previous_stim2)
   stim1 stim2 Chosen Rejected Previous_stim1 Previous_stim2
1:     2     1      2        1           <NA>           <NA>
2:     3     2      2        3           <NA>         Chosen
3:     3     1      1        3       Rejected       Rejected
4:     2     3      3        2         Chosen       Rejected
5:     1     3      1        3         Chosen         Chosen
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • 1
    Nice one, man. Haven't tried this one yet, but I thought of "something using `row_number()`". – Martin Gal Aug 26 '21 at 19:25
  • yup, this answer uses row_number https://stackoverflow.com/a/68944025/4663008 and another answer there uses rowwise to figure it out too. – Arthur Yip Aug 26 '21 at 19:32
  • thank you tarjae this is super helpful. The piece of the dataframe I shared works perfectly but for some reason as it goes on it no longer classifies correctly. I edited the original question with the output of your suggested script. Do you know why this happens? again, thanks so much for your help – user15791858 Aug 29 '21 at 08:31
2

For Update 2

setDT(df)[
  ,
  paste0("Previous_stim", 1:2) := lapply(
    .(stim1, stim2),
    f,
    cr = cbind(Chosen, Rejected)
  )
][
  ,
  paste0(c("left", "right"), "type") := lapply(.SD, function(x) 2 * (x == "Rejected") + 2 - outcome),
  .SDcols = patterns("Previous")
][]

gives

   stim1 stim2 Chosen Rejected outcome Previous_stim1 Previous_stim2 lefttype
1:     2     1      2        1       1           <NA>           <NA>       NA
2:     3     2      2        3       1           <NA>         Chosen       NA
3:     3     1      1        3       1       Rejected       Rejected        3
4:     2     3      3        2       0         Chosen       Rejected        2
5:     1     3      1        3       0         Chosen         Chosen        2
6:     2     1      1        2       1       Rejected         Chosen        3
7:     2     3      2        3       1       Rejected       Rejected        3
8:     3     1      1        3       0       Rejected         Chosen        4
9:     2     1      2        1       0         Chosen         Chosen        2
   righttype
1:        NA
2:         1
3:         3
4:         4
5:         2
6:         1
7:         3
8:         2
9:         2

Data

> dput(df)
structure(list(stim1 = c(2L, 3L, 3L, 2L, 1L, 2L, 2L, 3L, 2L),
    stim2 = c(1L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 1L), Chosen = c(2L,
    2L, 1L, 3L, 1L, 1L, 2L, 1L, 2L), Rejected = c(1L, 3L, 3L,
    2L, 3L, 2L, 3L, 3L, 1L), outcome = c(1, 1, 1, 0, 0, 1, 1,
    0, 0)), class = "data.frame", row.names = c(NA, -9L))

As per your update, you can try the following code by defining a custom function f

f <- function(stim, cr) {
  res <- rep(NA, length(stim))
  for (k in seq_along(stim)[-1]) {
    ind <- which(cr[1:(k - 1), , drop = FALSE] == stim[k], arr.ind = TRUE)
    if (length(ind)) {
      res[k] <- colnames(cr)[tail(ind[, "col"][order(ind[, "row"])], 1)]
    }
  }
  res
}

setDT(df)[
  ,
  paste("Previous_stim", 1:2) := lapply(
    .(stim1, stim2),
    f,
    cr = cbind(Chosen, Rejected)
  )
][]

and you will see

> setDT(df)[, paste("Previous_stim",1:2) := lapply(.(stim1,stim2),f, cr = cbind(Chosen, Rejected))][]
   stim1 stim2 Chosen Rejected Previous_stim 1 Previous_stim 2
1:     2     1      2        1            <NA>            <NA>
2:     3     2      2        3            <NA>          Chosen
3:     3     1      1        3        Rejected        Rejected
4:     2     3      3        2          Chosen        Rejected
5:     1     3      1        3          Chosen          Chosen
6:     2     1      1        2        Rejected          Chosen
7:     2     3      2        3        Rejected        Rejected
8:     3     1      1        3        Rejected          Chosen
9:     2     1      2        1          Chosen          Chosen
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • I added the follow up question to your solution in update 2, since my eventual objective is to also encode what the outcome was. I am not yet familiar with data.frame- would I need to create four distinct functions for that? thanks again! – user15791858 Aug 30 '21 at 08:32
  • thank you again for all of your help! i updated the question with one last piece of it- wondering if there is an easy way to return that additional information (where the pair is repeated and where its not) – user15791858 Sep 10 '21 at 11:06
  • @user15791858 I guess this is the easiest way I can do so far ... – ThomasIsCoding Sep 10 '21 at 11:08
  • i made an attempt to use your function to return the last stimulus (see latest update) but it seems like im a bit off- any idea what im doing wrong – user15791858 Sep 12 '21 at 13:05
1

I think you need to correct the desired outcome in the above table. But it looks like you are looking for the lag verb which can helpfully solve this when used alongside if_else:

library(dplyr)

tbl <- tibble(stim1 = c(2,3,3,2,1), stim2 = c(1,2,1,3,3), 
              chosen = c(2,2,1,3,1), rejected = c(1,3,3,2,3))

tbl %>% 
mutate(Previous_stim1 = if_else(lag(tbl$chosen) == lag(stim1), "Chosen", "Rejected")) %>%
mutate(Previous_stim2 = if_else(lag(tbl$chosen) == lag(stim2), "Chosen", "Rejected")) 
M Daaboul
  • 214
  • 2
  • 4