I have a big file with 107635 rows, and 3 columns: subject, regions of interest (ROIs), and number of the trial. The ROIs can be A, B, C, D, E, F. What I want to do is to keep only those trials where in the column ROI I have a consecutive sequence of B, C, D, the first time that B appears. It doesn't matter how many times B, C and D occur.
In the example below, I can keep ntrial 78 and 201, because the first time that B appeared was followed by C and D. However, I need to remove the ntrial 10 and 400. In the trial 10 B, C and D are not consecutive. In the trial 400 the first time that B appears, B is not followed by C and D.
For the output, I just need a column with a value of 1 for the trials to keep, in each row, and a value of 0 for the rows corresponding to the trials to remove.
Any suggestion on how to create a code that can automatise the procedure, without visually inspect each trial?
Many thanks!
subject ROI ntrial output
sbj05 A 78 1
sbj05 A 78 1
sbj05 A 78 1
sbj05 A 78 1
sbj05 A 78 1
sbj05 A 78 1
sbj05 B 78 1
sbj05 B 78 1
sbj05 C 78 1
sbj05 D 78 1
sbj05 E 78 1
sbj05 E 78 1
sbj05 E 78 1
sbj05 A 201 1
sbj05 A 201 1
sbj05 A 201 1
sbj05 A 201 1
sbj05 A 201 1
sbj05 B 201 1
sbj05 C 201 1
sbj05 D 201 1
sbj05 E 201 1
sbj05 E 201 1
sbj05 E 201 1
sbj05 F 201 1
sbj05 F 201 1
sbj05 A 10 0
sbj05 A 10 0
sbj05 A 10 0
sbj05 A 10 0
sbj05 B 10 0
sbj05 A 10 0
sbj05 C 10 0
sbj05 D 10 0
sbj05 E 10 0
sbj05 E 10 0
sbj05 A 400 0
sbj05 A 400 0
sbj05 A 400 0
sbj05 B 400 0
sbj05 A 400 0
sbj05 B 400 0
sbj05 C 400 0
sbj05 C 400 0
sbj05 C 400 0
sbj05 D 400 0
sbj05 E 400 0
sbj05 E 400 0
sbj05 D 400 0