I have a data frame in R that is supposed to have duplicates. However, there are some duplicates that I would need to remove. In particular, I only want to remove row-adjacent duplicates, but keep the rest. For example, suppose I had the data frame:
df = data.frame(x = c("A", "B", "C", "A", "B", "C", "A", "B", "B", "C"),
y = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
This results in the following data frame
x y
A 1
B 2
C 3
A 4
B 5
C 6
A 7
B 8
B 9
C 10
In this case, I expect there to be repeating "A, B, C, A, B, C, etc.". However, it is only a problem if I see adjacent row duplicates. In my example above, that would be rows 8 and 9 with the duplicate "B" being adjacent to each other.
In my data set, whenever this occurs, the first instance is always a user-error, and the second is always the correct version. In very rare cases, there might be an instance where the duplicates occur 3 (or more) times. However, in every case, I would always want to keep the last occurrence. Thus, following the example from above, I would like the final data set to look like
A 1
B 2
C 3
A 4
B 5
C 6
A 7
B 9
C 10
Is there an easy way to do this in R? Thank you in advance for your help!
Edit: 11/19/2014 12:14 PM EST There was a solution posted by user Akron (spelling?) that has since gotten deleted. I am now sure why because it seemed to work for me?
The solution was
df = df[with(df, c(x[-1]!= x[-nrow(df)], TRUE)),]
It seems to work for me, why did it get deleted? For example, in cases with more than 2 consecutive duplicates:
df = data.frame(x = c("A", "B", "B", "B", "C", "C", "C", "A", "B", "C", "A", "B", "B", "C"), y = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
x y
1 A 1
2 B 2
3 B 3
4 B 4
5 C 5
6 C 6
7 C 7
8 A 8
9 B 9
10 C 10
11 A 11
12 B 12
13 B 13
14 C 14
> df = df[with(df, c(x[-1]!= x[-nrow(df)], TRUE)),]
> df
x y
1 A 1
4 B 4
7 C 7
8 A 8
9 B 9
10 C 10
11 A 11
13 B 13
14 C 14
This seems to work?