Get the n-smallest values from a nested python list

Question

I have the following list:

l = [(('01001', '01003'), 4.15),
 (('01001', '01005'), 2.83),
 (('01001', '01007'), 3.32),
 (('01001', '01008'), 2.32),
 (('01001', '01009'), 9.32),
 (('01001', '01007'), 0.32),
 (('01002', '01009'), 6.83),
 (('01002', '01011'), 2.53),
 (('01002', '01009'), 6.83),
 (('01002', '01011'), 2.53),
 (('01002', '01009'), 6.83),
 (('01002', '01011'), 2.53),
 (('01003', '01013'), 20.50),
 (('01003', '01013'), 10.50),
 (('01003', '01013'), 0.50),
 (('01003', '01013'), 2.50),
 (('01003', '01013'), 20.30),
 (('01003', '01013'), 12.50),
 (('01003', '01013'), 1.50),
 (('01003', '01013'), 2.40)]

I would like to select the n-smallest values for the first element of this list ('01001', '01002', and '01003').

I was able to calcualte the min value with this code:

from itertools import groupby
from statistics import mean

{k:min(v for *_, v in v) for k,v in groupby(result_map, lambda x: x[0][0])}

but would like to get the 3 smallest values and the second column to be printed:

Expected outcome would be a dictionary like this:

{'01001': ['01007', '01008', '01005'], '01002': ['01011', '01009', '01013']  , '01003': ['01013', '01013', ''01013']}

Any help would be much appreciated!

What determines which numbers go in the lists `['01007', '01008', '01005']` etc.? — Stuart, Sep 29 '20 at 14:46
your tuple values also repeat in some scenarios btw, ('0100', '01007') appears multiple times, what if such a tuple has two values lower than some other second column value — gold_cy, Sep 29 '20 at 14:53

score 1 · Answer 1 · answered Sep 29 '20 at 14:59

1

The following should work:

d={i:sorted([k[0][1] for k in l if k[0][0]==i])[:3] for i in set([i[0][0] for i in l])}

print(d)

{'01001': ['01003', '01005', '01007'], '01002': ['01009', '01009', '01009'], '01003': ['01013', '01013', '01013']}

answered Sep 29 '20 at 14:59

IoaTzimas

10,538
2
13
30

have to love one-liner completely unreadable answers. they sure do promote good practices... – gold_cy Sep 29 '20 at 15:16
Does not match the OP's expected outcome. Unclear but they seem to want to sort by the third number in each tuple. – Stuart Sep 29 '20 at 15:29
OP's outcome is not accurate as it has 01013 for 01002 which doesn't actually exist. Also, it has similiar values for 01003 which means that we are not looking for unique values – IoaTzimas Sep 29 '20 at 15:32

Arty · Accepted Answer · 2020-09-29T18:07:12.470

1

{k:[e[0][1] for e in sorted(v, key = lambda x: x[1])][:n] for k,v in groupby(result_map, lambda x: x[0][0])}

this above is your provided code with groupby but modified a bit to compute n-smallest list instead of min.

From your question's example it wasn't clear if you want repeated elements in n-smallest list or not (second entry '01002': ['01011', '01009', '01013'] has no repetitions, but third '01003': ['01013', '01013', ''01013'] has repetitions in your example), so I provide second one-liner to solve task without repetitions:

{k:[e[0][1] for e in sorted({f[0][1] : f for f in v}.values(), key = lambda x: x[1])][:n] for k,v in groupby(result_map, lambda x: x[0][0])}

Full version of code can be found and tried online here!

edited Sep 29 '20 at 18:07

answered Sep 29 '20 at 15:14

Arty

14,883
6
36
69

have to love one-liner completely unreadable answers. they sure do promote good practices... – gold_cy Sep 29 '20 at 15:16
@gold_cy Original questioner's code was one-liner too :), hence I've decided to make out of it same style of code (one-liner) just improved a bit. – Arty Sep 29 '20 at 15:20
This does not provide the OP's expected outcome. They seem to want to exclude repeated values so the list for '01002' would be ['01011', '01009', '01013'] not ['01011', '01011', '01011']. – Stuart Sep 29 '20 at 15:27
@Stuart Actually questioner's example is controversary, second element has no repeated elements, but third (`'01003': ['01013', '01013', ''01013']`) has repetiotions, so I can't know whether he wanted or not repetition. – Arty Sep 29 '20 at 15:41
@Stuart Just added to my answer second one-liner without repetitions so that questioner can choose the one he wants. – Arty Sep 29 '20 at 15:48

hiro protagonist · Answer 3 · 2020-09-29T16:00:55.367

a pretty explicit but straight-forward version. i iterate once only over the input list lst:

from bisect import bisect_left
from collections import defaultdict

lst = [(('01001', '01003'), 4.15),
       ...
       (('01003', '01013'), 2.40)]

maxlen = 3

ret = defaultdict(list)
val = defaultdict(list)
for ((first, second), value) in lst:
    r = ret[first]
    v = val[first]
    if not r:
        r.append(second)
        v.append(value)
    else:
        if value not in v:
            idx = bisect_left(v, value)
            r.insert(idx, second)
            v.insert(idx, value)
    if len(r) > maxlen:
        ret[first] = r[:3]
        val[first] = v[:3]

print(ret)  # defaultdict(<class 'list'>, {
#  '01001': ['01007', '01008', '01005'], 
#  '01002': ['01011', '01009'], 
#  '01003': ['01013', '01013', '01013']})

print(val)  # defaultdict(<class 'list'>, {
#  '01001': [0.32, 2.32, 2.83], 
#  '01002': [2.53, 6.83], 
#  '01003': [0.5, 1.5, 2.4]})

where i use the defaultdict val to store the values corresponding to the result res.

and i use the bisect module to find the insert index idx.

the design might be better if the values and the results were in the same data structure and not separated in ret and val (e.g a tuple or even a namedtuple).

Get the n-smallest values from a nested python list

3 Answers3