I would like to get the indices of the unique rows in an array. A unique row should have its own index (starting with zero). Here is an example:
import numpy as np
a = np.array([[ 0., 1.],
[ 0., 2.],
[ 0., 3.],
[ 0., 1.],
[ 0., 2.],
[ 0., 3.],
[ 0., 1.],
[ 0., 2.],
[ 0., 3.],
[ 1., 1.],
[ 1., 2.],
[ 1., 3.],
[ 1., 1.],
[ 1., 2.],
[ 1., 3.],
[ 1., 1.],
[ 1., 2.],
[ 1., 3.]])
In the above array there are six unique rows:
import pandas as pd
b = pd.DataFrame(a).drop_duplicates().values
array([[ 0., 1.],
[ 0., 2.],
[ 0., 3.],
[ 1., 1.],
[ 1., 2.],
[ 1., 3.]])
Each row represents an index (0, 1, 2, 3, 4 ,5). In order to get the indices of unique rows in array a
, the result would be:
[0, 1, 2, 0, 1, 2, 0, 1, 2, 3, 4, 5, 3, 4, 5, 3, 4, 5]
How can I get to this result in an efficient way?