I am trying to subset a data frame based on specific sequence occurring in column v3
.
A sample of a dataframe:
v1 <- c(1:20)
v2 <- c(1,1,0,0,1,0,1,1,1,0,1,1,0,0,0,1,1,0,0,0)
v3 <- c(4,4,2,3,2,3,2,4,4,2,3,2,3,3,3,4,4,2,3,3)
my_df <- data.frame(v1,v2,v3) # creating a dataframe
sample output for my_df
v1 v2 v3
1 1 1 4
2 2 1 4
3 3 0 2
4 4 0 3
5 5 1 2
6 6 0 3
7 7 1 2
8 8 1 4
9 9 1 4
10 10 0 2
11 11 1 3
12 12 1 2
13 13 0 3
14 14 0 3
15 15 0 3
16 16 1 4
17 17 1 4
18 18 0 2
19 19 0 3
20 20 0 3
The output I am trying to achieve should look like this
1 1 1 4
2 2 1 4
3 3 0 2
8 8 1 4
9 9 1 4
10 10 0 2
16 16 1 4
17 17 1 4
18 18 0 2
So I want to subset my df according to sequence of 4 4 2
in column v3
. What I tried so far is:
my_df[which(c(diff(v3))==-2),]
but this only extracts the middle four of the sequence 4 4 2
like
v1 v2 v3
2 2 1 4
9 9 1 4
17 17 1 4
Another option I tried:
m = match(v3, c(4,4,2))
> m
[1] 1 1 3 NA 3 NA 3 1 1 3 NA 3 NA NA NA 1 1 3 NA NA
> my_df[!is.na(m),]
v1 v2 v3
1 1 1 4
2 2 1 4
3 3 0 2
5 5 1 2
7 7 1 2
8 8 1 4
9 9 1 4
10 10 0 2
12 12 1 2
16 16 1 4
17 17 1 4
18 18 0 2
This output gives me all 4 and 2 but not the sequence 4 4 2
that I want. Any help would be appreciated.
I already achieved this in matlab with for and if loop but I am just wondering how I can solve this in R in a loopless way.