I have to code many data.frames
. For example:
tt <- data.frame(V1=c("test1", "test3", "test1", "test4", "wins", "loses"),
V2=c("someannotation", "othertext", "loads of text including the word winning for the winner and the word losing for the loser", "blablabla", "blablabla", "blablabla"))
tt
V1 V2
test1 someannotation
test3 othertext
test1 loads of text including the word winning for the winner and the word losing for the loser
test4 blablabla
wins blablabla
loses blablabla
The coding has to go into a new data.frame
and I have to code, if a runner wins or loses. If V1
indicates wins
then he wins (and if he loses, it's indicated by loses
). However, there is a possibility that the runner wins or loses parts of a race, this is indicated by test1
in V1
and specified by V2
. If the term winning
in V2
appears before the term losing
the runner wins parts of the race (and vice-vers-ca).
I've tried to implement elements of answers from here to specify which word/string appears on which position:
find location of character in string
The implementation looks like this:
result <- data.frame()
for(i in 1:length(tt[,1])){
if(grepl("wins", tt[i,1])) result[i,1] <- "wins"
if(grepl("loses", tt[i,1])) result[i,1] <- "loses"
if(grepl("test1", tt[i,1])&(which(strsplit(tt[i,2], " ")[[1]]=="winning")>which(strsplit(tt[i,2], " ")[[1]]=="losing"))) result[i,1] <- "loses"
if(grepl("test1", tt[i,1])&(which(strsplit(tt[i,2], " ")[[1]]=="winning")<which(strsplit(tt[i,2], " ")[[1]]=="losing"))) result[i,1] <- "wins"
}
But there is an error message for cells of the column V2
that don't contain either winning
or losing
:
Error in if (grepl("test1", tt[i, 1]) & (which(strsplit(tt[i, 2], " ")[[1]] == : argument is of length zero
Does someone have a work around that problem or even a sophisticated solution? Any help is appreciated, thanks!
Edit
As @grrgrrbla kindly clarified, there are two possibilities to win: one is if V1 == "win"
, the other is if V2
contains the word "winning" before the word "losing" the runner also wins, there are 2 possibilites to lose: V1 == "loses"
or V2
contains "losing" before "winning".
My output should look like this:
result
V1
NA
NA
wins
NA
wins
loses