How to remove all consecutive data but keep only the first row

Question

Referring to this question: R - delete consecutive (ONLY) duplicates I am using the same formula:

df[c(df$x[-1] != df$x[-nrow(df)],TRUE),]

But I am only having the last values and I want to fist ones how can I change that? Thank you!

Please provide enough code so others can better understand or reproduce the problem. — Community, Jun 29 '22 at 15:22

score 0 · Accepted Answer · answered Jun 29 '22 at 19:22

Here are a few options.

First, you can use rle to get indices of consecutive values. To keep the first value in a series of consecutive numbers, start with index of 1, and add the other indices cumulatively.

lens <- rle(df$x)$lengths
df[cumsum(c(1, lens[-length(lens)])), ]

As an alternative, using tidyverse you can create groups where there is a difference in x by rows. You could keep the first value in each group.

library(dplyr)

df %>%
  group_by(grp = c(T, diff(x) != 0)) %>%
  filter(grp) %>%
  ungroup %>%
  select(-grp)

Or with data.table you can use rleid (function to gerate run-length type group id). Duplicates are FALSE. Keep rows where not FALSE allows you to keep the first row among repeats.

library(data.table)

setDT(df)[!duplicated(rleid(x))]

How to remove all consecutive data but keep only the first row

1 Answers1