I'm trying to do a pretty simple task in Python that I have already done in Julia. It consists of taking an array of multiple 3d elements and making a dictionary of indexes of unique values from that list (note the list is 6,000,000 elements long). I have done this in Julia and it is reasonably fast (6 seconds) - here is the code:
function unique_ids(itr)
#create dictionnary where keys have type of whatever itr is
d = Dict{eltype(itr), Vector}()
#iterate through values in itr
for (index,val) in enumerate(itr)
#check if the dictionary
if haskey(d, val)
push!(d[val],index)
else
#add value of itr if its not in v yet
d[val] = [index]
end
end
return collect(values(d))
end
So far so good. However, when I try doing this in Python, it seems to take forever, so long that I can't even tell you how long. So the question is, am I doing something dumb here, or is this just the reality of the differences between these two languages? Here is my Python code, a translation of the Julia code.
def unique_ids(row_list):
d = {}
for (index,val) in tqdm(enumerate(row_list)):
if str(val) in d:
d[str(val)].extend([index])
else:
d[str(val)] = [index]
return list(d.values())
Note that I use strings for the keys of the dict in Python as it is not possible to have an array as a key in Python.