I have the following list of lists that contains 6 entries:
lol = [['a', 3, 1.01],
['x', 5, 1.00],
['k', 7, 2.02],
['p', 8, 3.00],
['b', 10, 1.09],
['f', 12, 2.03]]
Each sublist in lol
contains 3 elements:
['a', 3, 1.01]
e1 e2 e3
The list above is already sorted according to e2
(i.e, 2nd element)
I'd like to 'cluster' the above list following roughly these steps:
- Pick the lowest entry (wrt. e2) in
lol
as the key of first cluster - Assign that as first member of the cluster (dictionary of list)
- Calculate the difference current e3 in next list with first member of existing clusters.
- If the difference is less than threshold, assign that list as the member of the corresponding cluster Else, create new cluster with current list as new key.
- Repeat the rest until finish
The final result will look like this, with threshold <= 0.1.
dol = {'a':['a', 'x', 'b'],
'k':['k', 'f'],
'p':['p']}
I'm stuck with this, what's the right way to do it:
import json
from collections import defaultdict
thres = 0.1
tmp_e3 = 0
tmp_e1 = "-"
lol = [['a', 3, 1.01], ['x', 5, 1.00], ['k', 7, 2.02],
['p', 8, 3.00], ['b', 10, 1.09], ['f', 12, 2.03]]
dol = defaultdict(list)
for thelist in lol:
e1, e2, e3 = thelist
if tmp_e1 == "-":
tmp_e1 = e1
else:
diff = abs(tmp_e3 - e3)
if diff > thres:
tmp_e1 = e1
dol[tmp_e1].append(e1)
tmp_e1 = e1
tmp_e3 = e3
print json.dumps(dol, indent=4)