removing duplicates - ** only when the duplicates occur in sequence

Question

I would like to do something similar to the following, except I would only like to remove 'g' and'g' because they are the duplicates that occur one after each other. I would also like to keep the sequence the same.

Any help would be appreciated!!!

I have this cell array in MATLAB:

y = { 'd' 'f' 'a' 'g' 'g' 'w' 'a' 'h'}


ans =

'd'    'f'    'a'    'w'    'a'    'h'

Welcome to StackOverflow. Tags "vector", "duplicates" and "sequence" are worthless without the "matlab" tags. You are not looking for a vector specialist. You are looking for a matlab specialist. — Pascal Cuoq, Mar 16 '11 at 23:39

b3. · Accepted Answer · 2011-03-17T06:51:06.487

3

There was an error in my first answer (below) when used on multiple duplicates (thanks grantnz). Here's an updated version:

>> y = { 'd' 'f' 'a' 'g' 'g' 'w' 'a' 'h' 'h' 'i' 'i' 'j'};
>> i = find(diff(char(y)) == 0);
>> y([i; i+1]) = []

y = 

    'd'    'f'    'a'    'w'    'a'    'j'

OLD ANSWER

If your "cell vector" always contains only single character elements you can do the following:

>> y = { 'd' 'f' 'a' 'g' 'g' 'w' 'a' 'h'}

y = 

    'd'    'f'    'a'    'g'    'g'    'w'    'a'    'h'

>> y(find(diff(char(y)) == 0) + [0 1]) = []

y = 

    'd'    'f'    'a'    'w'    'a'    'h'

edited Mar 17 '11 at 06:51

answered Mar 16 '11 at 23:43

b3.

7,094
2
33
48

Have you tried your solution with more than one duplicate? I think you will get a 'Matrix dimensions must agree' error. – grantnz Mar 17 '11 at 00:23
thanks! they all seem to work well. Would it be easy to modify this code to excise triplicates instead? Is it possible to use the diff between i, i+1 and i+2 or something like that? >> y = { 'd' 'f' 'a' 'g' 'g' 'g' 'w' 'a' 'h' 'h' 'i' 'i' 'j'}; >> i = find(diff(char(y)) == 0); >> y([i; i+1]) = [] – jess Mar 17 '11 at 17:50
@jess: Yes, this approach will work as it is for any number of consecutive duplicates. – b3. Mar 17 '11 at 18:01
great. thanks very much! I think I am starting to see the logic. – jess Mar 17 '11 at 23:39

Gareth McCaughan · Answer 2 · 2011-03-17T00:17:38.507

1

Look at it like this: you want to keep an element if and only if either (1) it's the first element or (2) its predecessor is different from it and either (3) it's the last element or (4) its successor is different from it. So:

y([true ~strcmp(y(1:(end-1)),y(2:end))] & [~strcmp(y(1:(end-1)),y(2:end)) true])

or, perhaps better,

different = ~strcmp(y(1:(end-1)),y(2:end));
result = y([true different] & [different true]);

edited Mar 17 '11 at 00:17

answered Mar 16 '11 at 23:43

Gareth McCaughan

19,888
1
41
62

This solution does not quite fit the question's requirements. _Both_ elements should be removed in the case of consecutive duplicates. – b3. Mar 17 '11 at 00:03
Whoops, you're right: I misread the question. Now fixed (so I'm afraid your comment will not make sense to anyone who doesn't read this one and realise I've edited my answer; sorry). – Gareth McCaughan Mar 17 '11 at 00:14

grantnz · Answer 3 · 2011-03-17T00:29:49.923

0

This should work:

 y([ diff([y{:}]) ~= 0 true])

or slightly more compactly

 y(diff([y{:}]) == 0) = []

Correction : The above wont remove both the duplicates

ind = diff([y{:}]) == 0;
y([ind 0] | [0 ind]) = []

BTW, this works even if there are multiple duplicate sequences

eg,

y = { 'd' 'f' 'a' 'g' 'g' 'w' 'a' 'h' 'h'};
ind = diff([y{:}]) == 0;

y([ind 0] | [0 ind]) = []

y = 

     'd'    'f'    'a'    'w'    'a'

edited Mar 17 '11 at 00:29

answered Mar 16 '11 at 23:46

grantnz

7,322
1
31
38

This solution does not remove both duplicates as required. – b3. Mar 17 '11 at 00:04

removing duplicates - ** only when the duplicates occur in sequence

3 Answers3

Linked