TL;DR: within your function change
out = id_arr.cumsum()[np.argsort(a).argsort()]
into
out = id_arr.cumsum()[np.argsort(a, kind='mergesort').argsort()]
If speed is a concern, use the solution offered by @piRSquared in the post mentioned. You'll need the first three functions mentioned. So:
import numpy as np
arr = np.array([144, 144, 144, 144, 143, 143, 143, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93])
def dfill(a):
n = a.size
b = np.concatenate([[0], np.where(a[:-1] != a[1:])[0] + 1, [n]])
return np.arange(n)[b[:-1]].repeat(np.diff(b))
def argunsort(s):
n = s.size
u = np.empty(n, dtype=np.int64)
u[s] = np.arange(n)
return u
def cumcount(a):
n = a.size
s = a.argsort(kind='mergesort')
i = argunsort(s)
b = a[s]
return (np.arange(n) - dfill(b))[i]
cumcount(arr) # will get you desired output
The accepted answer from the referenced post is actually incorrect.
The problem lies with the fact that np.argsort
uses quicksort as the default sorting algorithm. For a stable sort, we need mergesort (see the comments by @MartijnPieters on the matter here).
So, in your slightly adjusted function we need:
import numpy as np
def grp_range(a):
count = np.unique(a,return_counts=1)[1]
idx = count.cumsum()
id_arr = np.ones(idx[-1],dtype=int)
id_arr[0] = 0
id_arr[idx[:-1]] = -count[:-1]+1
out = id_arr.cumsum()[np.argsort(a, kind='mergesort').argsort()] # adjustment here
return out
Testing a couple of examples:
# OP's example
arr = np.array([144, 144, 144, 144, 143, 143, 143, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93])
arr_result = grp_range(arr)
print(arr_result)
# [0 1 2 3 0 1 2 0 1 2 3 4 5 6 7 8 9] (correct)
# OP's example, with mixed sequence (note addition 143, 144 at arr1[9:11])
arr_alt = np.array([144, 144, 144, 144, 143, 143, 143, 93, 93, 143, 144, 93, 93, 93, 93, 93, 93])
arr_alt_result = grp_range(arr_alt)
print(arr_alt_result)
# [0 1 2 3 0 1 2 0 1 3 4 2 3 4 5 6 7] (correct) (note: arr_alt_result[9:11] == array([3, 4], dtype=int32))
As mentioned above, the solution offered by @piRSquared will be faster than this with the same results.
A final aside. The sequence posted is in descending order. If this is true for the actual data you are working with, you could do something like this:
import numpy as np
arr = np.array([144, 144, 144, 144, 143, 143, 143, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93])
count = np.unique(arr,return_counts=1)[1][::-1] # from ascending to descending
out = np.concatenate(list(map(np.arange,count)), axis=0)
# out: [0 1 2 3 0 1 2 0 1 2 3 4 5 6 7 8 9]
or this:
from collections import Counter
arr = [144, 144, 144, 144, 143, 143, 143, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93]
count_dict = Counter(arr)
out = list()
for v in count_dict.items():
out.extend([*range(v[1])])
or indeed, use the answer provided by @bpfrd.