Remove duplicates based on next row

Question

I'm new to R. I'm looking to remove duplicate rows in a data frame where df$x = "string" AND the next row = the same string

so say I have this column

1. String - remove 2. String 3. A 4. A 5. A 6. String - remove 7. String - remove 8. String 9. A 10. A

The result I want would be

2. String 3. A 4. A 5. A 8. String 9. A 10. A

Ronak Shah · Accepted Answer · 2019-01-11T05:24:07.053

3

We can use lead from dplyr and remove rows where the current and next row is "String".

library(dplyr)

df %>%
  filter(!(V1 == "String" & lead(V1) == "String"))

#      V1
#1 String
#2      A
#3      A
#4 String
#5      A

Using base R, we can do

df[!((df$V1 == "String") & c(df$V1[-1], NA) == "String"),,drop = FALSE]

#      V1
#2 String
#3      A
#4      A
#7 String
#8      A

data

df <- structure(list(V1 = c("String", "String", "A", "A", "String", 
"String", "String", "A")), .Names = "V1", row.names = c(NA, -8L
 ), class = "data.frame")

edited Jan 11 '19 at 05:24

answered Jan 11 '19 at 05:05

Ronak Shah

377,200
20
156
213

Yes!! I used the base R option :) – queenElizabeth Jan 11 '19 at 17:08

score 0 · Answer 2 · answered Jan 11 '19 at 10:08

We could create a logical index with duplicated and rleid for subsetting the rows

library(data.table)
setDT(df)[!(duplicated(rleid(V1)) & V1 == 'String')]
#       V1
#1: String
#2:      A
#3:      A
#4: String
#5:      A

data

df <- structure(list(V1 = c("String", "String", "A", "A", "String", 
"String", "String", "A")), row.names = c(NA, -8L), class = "data.frame")

Remove duplicates based on next row

2 Answers2

data