I would like to keep only rows before and after a specific values of a column in my data frame. My data frame below has a structure, where you can see that I have some sort of blocks if I can say so. For example, the data I am interested with always (or most of the time to be specific) starts at group
and ends at section
, anything else outside I would like to remove.
# Sample Data
df <- data.frame(
Name = c ("x1","NA","group","Jean","Philippe","Celine","Dion","section","NA",
"y2","z1","NA","group","Rosa","Albert","Stromae","section","NA","abc","something",
"group","Sam","Liz"),
value = as.character(seq(1:23))
)
df
Name value
1 x1 1
2 NA 2
3 group 3
4 Jean 4
5 Philippe 5
6 Celine 6
7 Dion 7
8 section 8
9 NA 9
10 y2 10
11 z1 11
12 NA 12
13 group 13
14 Rosa 14
15 Albert 15
16 Stromae 16
17 section 17
18 NA 18
19 abc 19
20 something 20
21 group 21
22 Sam 22
23 Liz 23
Since the block group
:section
does not always have the same information, I don't know how can I tell R
to keep rows between group
andsection
even if they are repeated. I only came up with this, which just keeps the rows the first time R sees group
andsection
.
df[which(df$Name=="group")[1]:which(df$Name=="section")[1],]
Name value
3 group 3
4 Jean 4
5 Philippe 5
6 Celine 6
7 Dion 7
8 section 8
Update
: Also, sometimes in my data I will have a block that starts with group
but does not have an ending section
. I would like to keep this information too. Based on your solutions, I added a row with section
everytime I don't have it, then apply what you proposed. I don't know if there is another way to take into account this case without adding a new row to the data.
The desired output would be
4 Jean 4
5 Philippe 5
6 Celine 6
7 Dion 7
14 Rosa 14
15 Albert 15
16 Stromae 16
22 Sam 22
23 Liz 23
Thank you guys in advance for your help.