How can i write a loop in R to create a new variable, which would calculate the distance between the current trial and its last occurrence?

Question

![enter image description here][2]I have a data set, which contains the lists of the 8 words(e.g. "klein", "warm"), occurring in a random pattern. I need to create a variable which shows me how many trials were in between the particular occurrence of the word and its last occurrence. for example, if the word showed up twice in a row, this new variable should be 0, if there was another word in between, the variable would be 1 and so on. can anyone help me and give me a hint? thank you in advance! P.s. you can see the picture is how it is done in SPSS

enter image description here

P.s. dput()

structure(list(ExperimentName = c("Habit_Experiment", "Habit_Experiment", 
"Habit_Experiment", "Habit_Experiment", "Habit_Experiment", "Habit_Experiment", 
"Habit_Experiment", "Habit_Experiment", "Habit_Experiment", "Habit_Experiment", 
"Habit_Experiment", "Habit_Experiment", "Habit_Experiment", "Habit_Experiment", 
"Habit_Experiment", "Habit_Experiment", "Habit_Experiment", "Habit_Experiment", 
"Habit_Experiment", "Habit_Experiment", "Habit_Experiment", "Habit_Experiment", 
"Habit_Experiment", "Habit_Experiment", "Habit_Experiment", "Habit_Experiment", 
"Habit_Experiment", "Habit_Experiment", "Habit_Experiment", "Habit_Experiment"
), Subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), AccP = c(90.38461, 90.38461, 90.38461, 90.38461, 90.38461, 
90.38461, 90.38461, 90.38461, 90.38461, 90.38461, 90.38461, 90.38461, 
90.38461, 90.38461, 90.38461, 90.38461, 90.38461, 90.38461, 90.38461, 
90.38461, 90.38461, 90.38461, 90.38461, 90.38461, 90.38461, 90.38461, 
90.38461, 90.38461, 90.38461, 90.38461), Age = c(28L, 28L, 28L, 
28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 
28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 
28L), Handedness = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("links", "rechts"), class = "factor"), 
    PracFail.RT = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L), Sex = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("divers", 
    "männlich", "weiblich"), class = "factor"), Block = 1:30, 
    Colour = c("YELLOW", "RED", "YELLOW", "YELLOW", "YELLOW", 
    "RED", "YELLOW", "YELLOW", "YELLOW", "YELLOW", "RED", "RED", 
    "RED", "RED", "YELLOW", "RED", "RED", "YELLOW", "RED", "RED", 
    "YELLOW", "RED", "YELLOW", "RED", "RED", "YELLOW", "DODGERBLUE", 
    "LIME", "DODGERBLUE", "LIME"), contingency.RESP = c("", "", 
    "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
    "", "", "", "", "", "", "", "", "", "", "", "", ""), contWord = c("", 
    "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
    "", "", "", "", "", "", "", "", "", "", "", "", "", ""), 
    Correct = c("l", "d", "l", "l", "l", "d", "l", "l", "l", 
    "l", "d", "d", "d", "d", "l", "d", "d", "l", "d", "d", "l", 
    "d", "l", "d", "d", "l", "l", "d", "l", "d"), Data = 1:30, 
    Data.Sample = 1:30, Rare_C = c("", "", "True", "False", "False", 
    "False", "True", "False", "False", "False", "False", "False", 
    "False", "True", "True", "True", "True", "False", "True", 
    "False", "False", "False", "False", "False", "False", "True", 
    "", "", "True", "False"), Stim.ACC = c(0L, 1L, 0L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L), Stim.CRESP = c("l", 
    "d", "l", "l", "l", "d", "l", "l", "l", "l", "d", "d", "d", 
    "d", "l", "d", "d", "l", "d", "d", "l", "d", "l", "d", "d", 
    "l", "l", "d", "l", "d"), Stim.RESP = c("d", "d", "d", "l", 
    "l", "d", "l", "l", "l", "l", "d", "d", "d", "d", "l", "d", 
    "d", "l", "d", "d", "d", "d", "l", "d", "d", "l", "d", "d", 
    "l", "d"), Stim.RT = c(NA, 808L, NA, 691L, 462L, 884L, 443L, 
    466L, 444L, 385L, 474L, 441L, 399L, 347L, 398L, 418L, 383L, 
    451L, 304L, 389L, NA, 467L, 395L, 338L, 333L, 327L, NA, 562L, 
    460L, 374L), Word = c("XXXX", "XXXX", "warm", "leicht", "leicht", 
    "warm", "klein", "ganz", "leicht", "leicht", "klein", "klein", 
    "warm", "ganz", "warm", "leicht", "leicht", "ganz", "ganz", 
    "klein", "ganz", "warm", "ganz", "warm", "klein", "klein", 
    "XXXX", "XXXX", "weich", "klar")), row.names = c(NA, 30L), class = "data.frame")

I added a picture in the description of the question, which shows how this was done in SPSS. it may be helpful. — Ani Zerekidze, Jan 16 '21 at 20:43

score 1 · Answer 1 · answered Jan 17 '21 at 00:59

You probably won't need a loop to do this. Here is a possible solution using the dplyr package.

Assuming your data frame is called df, here I first used select to remove other columns for demonstration. You can remove this line to keep your other columns in your data frame.

Next, add trial numbers to each row in your data for each subject. Then if you use group_by for each Subject and Word, you can calculate last_occurrence which is difference in trial values for that Word.

Words appearing the first time will be NA.

library(dplyr)

df %>%
  select(Subject, Word) %>%
  group_by(Subject) %>%
  mutate(trial = row_number()) %>%
  group_by(Subject, Word) %>%
  mutate(last_occurrence = trial - lag(trial) - 1)

Output

   Subject Word   trial last_occurrence
     <int> <chr>  <int>           <dbl>
 1       1 XXXX       1              NA
 2       1 XXXX       2               0
 3       1 warm       3              NA
 4       1 leicht     4              NA
 5       1 leicht     5               0
 6       1 warm       6               2
 7       1 klein      7              NA
 8       1 ganz       8              NA
 9       1 leicht     9               3
10       1 leicht    10               0
11       1 klein     11               3
12       1 klein     12               0
13       1 warm      13               6
14       1 ganz      14               5
15       1 warm      15               1
16       1 leicht    16               5
17       1 leicht    17               0
18       1 ganz      18               3
19       1 ganz      19               0
20       1 klein     20               7
21       1 ganz      21               1
22       1 warm      22               6
23       1 ganz      23               1
24       1 warm      24               1
25       1 klein     25               4
26       1 klein     26               0
27       1 XXXX      27              24
28       1 XXXX      28               0
29       1 weich     29              NA
30       1 klar      30              NA

How can i write a loop in R to create a new variable, which would calculate the distance between the current trial and its last occurrence?

1 Answers1