2

I'm new to R. I'm looking to remove duplicate rows in a data frame where df$x = "string" AND the next row = the same string

so say I have this column

1. String - remove 2. String 3. A 4. A 5. A 6. String - remove 7. String - remove 8. String 9. A 10. A

The result I want would be

2. String 3. A 4. A 5. A 8. String 9. A 10. A

Shubham
  • 763
  • 5
  • 20

2 Answers2

3

We can use lead from dplyr and remove rows where the current and next row is "String".

library(dplyr)

df %>%
  filter(!(V1 == "String" & lead(V1) == "String"))

#      V1
#1 String
#2      A
#3      A
#4 String
#5      A

Using base R, we can do

df[!((df$V1 == "String") & c(df$V1[-1], NA) == "String"),,drop = FALSE]

#      V1
#2 String
#3      A
#4      A
#7 String
#8      A

data

df <- structure(list(V1 = c("String", "String", "A", "A", "String", 
"String", "String", "A")), .Names = "V1", row.names = c(NA, -8L
 ), class = "data.frame")
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

We could create a logical index with duplicated and rleid for subsetting the rows

library(data.table)
setDT(df)[!(duplicated(rleid(V1)) & V1 == 'String')]
#       V1
#1: String
#2:      A
#3:      A
#4: String
#5:      A

data

df <- structure(list(V1 = c("String", "String", "A", "A", "String", 
"String", "String", "A")), row.names = c(NA, -8L), class = "data.frame")
akrun
  • 874,273
  • 37
  • 540
  • 662