I want to manipulate substrings in one column based on the indices of these substrings stored in another column of a dataframe:
Data:
df_test
Turn c5 Turns_split
1 we 're not gon na know the person PNP VBB XX0 VVG TO0 VVI AT0 NN1 we, 're, not, gon, na, know, the, person
2 great answer AJ0 NN1 great, answer
3 it 's gon na rain PNP VBZ VVG TO0 VVI it, 's, gon, na, rain
c5_split Index
1 PNP, VBB, XX0, VVG, TO0, VVI, AT0, NN1 4
2 AJ0, NN1
3 PNP, VBZ, VVG, TO0, VVI 3
The indices (the values 4
and 3
) are stored in column Index
; the substrings I want to manipulate are stored in c5
, which contains Part-of-Speech tags. The manipulation I would like to do is focused on two substrings in c5
: (i) the substring whose index is the same as the index value in Index
and (ii) the substring right thereafter, i.e., the substring with the Index
value + 1. The manipulation I want to carry out is to replace the whitespace between the two substrings with an =
sign. So the desired output in column c5
is this:
df_text$c5
"PNP VBB XX0 VVG=TO0 VVI AT0 NN1" "AJ0 NN1" "PNP VBZ VVG=TO0 VVI"
I'm really at a loss for how to do this and would therefore be grateful for guidance.
Reproducible data:
df_test <- structure(list(Turn = c("we 're not gon na know the person",
"great answer", "it 's gon na rain"), c5 = c("PNP VBB XX0 VVG TO0 VVI AT0 NN1",
"AJ0 NN1", "PNP VBZ VVG TO0 VVI"), Turns_split = list(c("we",
"'re", "not", "gon", "na", "know", "the", "person"), c("great",
"answer"), c("it", "'s", "gon", "na", "rain")), c5_split = list(
c("PNP", "VBB", "XX0", "VVG", "TO0", "VVI", "AT0", "NN1"),
c("AJ0", "NN1"), c("PNP", "VBZ", "VVG", "TO0", "VVI")), Index = list(
4L, integer(0), 3L)), row.names = c(NA, -3L), class = "data.frame")