2

I encountered a problem when I expected to find the first duplicated items in an array.
For example:

array = [a, b, c, b, b, a, c, a]

returns: [True, True, True, False, False, False, False, False]  

I have tried to use the np.unique function but it either returns unique values or returns indices of unique values.
Is there any function that is able to do this?

S3DEV
  • 8,768
  • 3
  • 31
  • 42
chrisli
  • 21
  • 3
  • 2
    what do you actually want the code to do? and what is your expected output – coderoftheday Nov 14 '20 at 13:59
  • 1
    Does this answer your question? [Determining duplicate values in an array](https://stackoverflow.com/questions/11528078/determining-duplicate-values-in-an-array) – ChaddRobertson Nov 14 '20 at 14:00
  • Is you want to use pandas, a Series has a `.duplicated()` function. – S3DEV Nov 14 '20 at 14:17
  • @ChaddRobertson - I think this is not a full duplicate. Only the approach and first step is the same. And it is already mentioned in the question that he got stuck at this point. – Michael Szczesny Nov 14 '20 at 14:53

2 Answers2

2

You had a good approach with np.unique. With return_index the information you need is returned.

I augmented your example to show that this works generally independent of the positions of unique values.

array = np.array(['a', 'b', 'c', 'b', 'b', 'a', 'c', 'd', 'a'])

_, i = np.unique(array, return_index=True)
res = np.zeros_like(array, dtype=bool)
res[i] = True
print(res)

Out:

[ True  True  True False False False False  True False]
Michael Szczesny
  • 4,911
  • 5
  • 15
  • 32
1

If it’s OK to use pandas, there is a convenience function called duplicated() which can be used on a Series.

Essentially, just wrap the numpy array in the Series constructor, call the (negated) function and return the boolean array as a numpy array.

Example:

a = np.array(['a', 'b', 'c', 'b', 'b', 'a', 'c', 'd', 'a'])

(~pd.Series(a).duplicated(keep='first')).to_numpy()

Output:

array([ True, True, True, False, False, False, False, True, False])
S3DEV
  • 8,768
  • 3
  • 31
  • 42