I have a dataset and a column that contains the start of sentences, that might occur:
- once
- twice
I have introduced two empty columns in it as follows:
data_copy <- data_copy %>%
mutate(unique = NA, duplication = NA)
In order to identify those rows that show 1 single repetition and those that instead have two. This is how I have proceeded
data_sent = data_copy$sentence_start
to get the complete values of rows I am interested in
duplicated = data_copy$sentence_start[duplicated(data_copy$sentence_start)]#%>% as.data.frame()
to find values that show repetitions.
I have implemented the following code to assign values at unique and duplication columns
for(i in data_sent){
if(i %in% duplicated){
data_copy[,16] <- '0'
data_copy[,25] <- '2'
} else {
data_copy[,16] <- '1'
data_copy[,25] <- '1'
}
}
The loop works by filling the two columns correctly, but the values seem to be overwritten since they are filled completely only with 0 and 2, the first two values of the statements.
A small extract of the dataset is here:
dput(head(dt, 5))
structure(list(cong_cond = c("congruent", "congruent", "congruent",
"congruent", "congruent"), sentence_start = c("Wojciech zranil sie mocno i spedzil noc w szpitalu pod",
"Nie chce isc do pracy, zeby wypelnic papierkowa robote, ale",
"Zacmienie Slonca nastepuje, gdy slonce i ksiezyc tworza", "Laura lubi tworzyc rekodziela i dlatego zapisala sie na",
"Altówka ma podobna budowe do skrzypiec, ale ma cieplejszy"),
presentation_mode = c("spoken", "spoken", "spoken", "spoken",
"spoken"), unique = c("0", "0", "0", "0", "0"), duplication = c("2",
"2", "2", "2", "2")), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))