How can I efficiently randomly select items from a dictionary that meet my requirements?

Question

So at the moment, I have a large dictionary of items. Might be a little confusing, but each of these keys have different values, and the values themselves correspond to another dictionary.

I need to make sure that my random selection from the first dict covers all possible values in the second dict. I'll provide a rudimentary example:

Dict_1 = {key1: (A, C), key2: (B, O, P), key3: (R, T, A)} # and so on 

Dict_2 = {A: (1, 4, 7), B: (9, 2, 3), C: (1, 3)}  # etc

I need a random selection of Dict_1 to give me a coverage of all numbers from 1 - 10 in Dict_2 values.

At the moment, I am selecting 6 random keys from Dict_1, taking all the numbers that those letters would correspond with, and comparing that set to a set of the numbers from 1 - 10. If the selection isn't a subset of 1 - 10, select 6 more random ones and try again, until I have 1 - 10.

Now, this works, but I know it's far from efficient. How can I improve this method?

I am using Python.

Questions about improving working code are better suited for the [Code Review SE](https://codereview.stackexchange.com). You'll definitely need to show the code there. — sj95126, Dec 03 '22 at 05:07
If the selection isn't a subset of 1-10, rather than rejecting the six keys and selecting six new random keys, I suggest using an approach similar to [simulated annealing](https://en.wikipedia.org/wiki/Simulated_annealing) or [random nearest neighbour search](https://en.wikipedia.org/wiki/Nearest_neighbor_search). The idea is that instead of restarting from zero, you try to gradually improve your random solution. (1/2) — Stef, Jan 13 '23 at 14:41
(2/2) For each of your six keys, find how many numbers in 1-10 are covered by that key but not covered by another key. This tells you how "useful" each of your six keys is. Now remove the least useful key, and replace it with a new random key. — Stef, Jan 13 '23 at 14:41

Stef · Answer 1 · 2023-01-13T15:19:34.760

At the moment, I am selecting 6 random keys from Dict_1, taking all the numbers that those letters would correspond with, and comparing that set to a set of the numbers from 1 - 10. If the selection isn't a subset of 1 - 10, select 6 more random ones and try again, until I have 1 - 10.

Now, this works, but I know it's far from efficient. How can I improve this method?

In case your solution doesn't fully cover 1-10, you're erasing the whole solution and restarting completely from scratch.This is what's inefficient.

Instead, you could use an approach inspired by simulated annealing or random nearest neighbour search. The idea is that if your solution doesn't fully cover 1-10, then instead of erasing it, you try to incrementally make it better.

One way to do this is to attribute a score to each of the six keys in your solution. This score should reflect how useful that key is in the solution; i.e., how many numbers in 1-10 are covered thanks to this key that are not already covered by another key.

Then, instead of picking six new random keys, you keep the five best keys and only pick one new random key. The solution should become incrementally better, until hopefully it covers the whole range 1-10.

import random

keylist1 = ['key{}'.format(n) for n in range(100)]
keylist2 = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
cover_range = range(1,21)  # 1-20 instead of 1-10 otherwise the problem is too simple

d1 = {k: random.choices(keylist2, 3) for k in keylist1}
# d1 = {'key0': ['N', 'C', 'L'], 'key1': ['P', 'N', 'M'], 'key2': ['I', 'G', 'Q'], 'key3': ['F', 'M', 'R'], 'key4': ['L', 'P', 'U'], 'key5': ['V', 'Q', 'L'], 'key6': ['R', 'W', 'K'], 'key7': ['T', 'S', 'I'], 'key8': ['W', 'M', 'T'], 'key9': ['A', 'K', 'Q'], 'key10': ['J', 'I', 'L'], 'key11': ['F', 'X', 'D'], 'key12': ['L', 'J', 'B'], 'key13': ['A', 'W', 'I'], 'key14': ['L', 'R', 'Y'], 'key15': ['V', 'O', 'Z'], 'key16': ['G', 'U', 'B'], 'key17': ['R', 'G', 'S'], 'key18': ['X', 'C', 'V'], 'key19': ['S', 'F', 'Z'], 'key20': ['J', 'S', 'L'], 'key21': ['E', 'P', 'X'], 'key22': ['L', 'X', 'E'], 'key23': ['B', 'L', 'O'], 'key24': ['B', 'T', 'W'], 'key25': ['H', 'V', 'Y'], 'key26': ['J', 'T', 'C'], 'key27': ['M', 'G', 'A'], 'key28': ['I', 'E', 'P'], 'key29': ['L', 'R', 'N'], 'key30': ['V', 'J', 'B'], 'key31': ['I', 'V', 'T'], 'key32': ['E', 'N', 'W'], 'key33': ['W', 'D', 'M'], 'key34': ['E', 'Q', 'P'], 'key35': ['C', 'Z', 'A'], 'key36': ['T', 'X', 'O'], 'key37': ['B', 'D', 'J'], 'key38': ['N', 'M', 'D'], 'key39': ['E', 'B', 'A'], 'key40': ['A', 'B', 'K'], 'key41': ['Z', 'B', 'O'], 'key42': ['G', 'L', 'A'], 'key43': ['P', 'N', 'H'], 'key44': ['Z', 'W', 'M'], 'key45': ['K', 'A', 'J'], 'key46': ['O', 'B', 'L'], 'key47': ['J', 'Z', 'F'], 'key48': ['C', 'D', 'O'], 'key49': ['F', 'B', 'J'], 'key50': ['H', 'V', 'T'], 'key51': ['A', 'L', 'O'], 'key52': ['N', 'T', 'Q'], 'key53': ['F', 'N', 'D'], 'key54': ['K', 'W', 'V'], 'key55': ['A', 'M', 'E'], 'key56': ['Z', 'J', 'A'], 'key57': ['S', 'B', 'W'], 'key58': ['D', 'S', 'P'], 'key59': ['E', 'Y', 'H'], 'key60': ['C', 'S', 'Y'], 'key61': ['L', 'P', 'M'], 'key62': ['H', 'S', 'N'], 'key63': ['S', 'U', 'J'], 'key64': ['J', 'N', 'R'], 'key65': ['E', 'B', 'W'], 'key66': ['B', 'V', 'Q'], 'key67': ['K', 'V', 'L'], 'key68': ['N', 'Z', 'H'], 'key69': ['O', 'U', 'E'], 'key70': ['E', 'W', 'H'], 'key71': ['W', 'P', 'A'], 'key72': ['G', 'W', 'X'], 'key73': ['Z', 'D', 'Q'], 'key74': ['S', 'Y', 'P'], 'key75': ['C', 'A', 'I'], 'key76': ['E', 'V', 'S'], 'key77': ['F', 'M', 'T'], 'key78': ['L', 'E', 'S'], 'key79': ['E', 'T', 'J'], 'key80': ['J', 'Y', 'A'], 'key81': ['I', 'F', 'G'], 'key82': ['D', 'S', 'L'], 'key83': ['F', 'E', 'P'], 'key84': ['X', 'L', 'T'], 'key85': ['H', 'U', 'M'], 'key86': ['W', 'A', 'C'], 'key87': ['Z', 'L', 'K'], 'key88': ['Y', 'N', 'X'], 'key89': ['F', 'K', 'B'], 'key90': ['Q', 'G', 'W'], 'key91': ['U', 'O', 'W'], 'key92': ['N', 'C', 'L'], 'key93': ['O', 'V', 'P'], 'key94': ['D', 'Y', 'R'], 'key95': ['S', 'K', 'I'], 'key96': ['G', 'Y', 'R'], 'key97': ['T', 'Z', 'G'], 'key98': ['C', 'A', 'Q'], 'key99': ['H', 'I', 'W']}
d2 = {c: random.sample(cover_range, 3) for c in keylist2}
# d2 = {'A': [14, 10, 17], 'B': [11, 20, 15], 'C': [11, 9, 8], 'D': [6, 18, 19], 'E': [18, 7, 1], 'F': [9, 14, 12], 'G': [17, 18, 20], 'H': [17, 12, 8], 'I': [17, 7, 5], 'J': [8, 20, 5], 'K': [17, 7, 13], 'L': [1, 18, 20], 'M': [5, 8, 18], 'N': [15, 17, 10], 'O': [16, 20, 18], 'P': [2, 18, 7], 'Q': [11, 17, 6], 'R': [3, 15, 4], 'S': [5, 15, 6], 'T': [6, 15, 20], 'U': [20, 12, 8], 'V': [20, 16, 3], 'W': [2, 16, 1], 'X': [5, 11, 1], 'Y': [2, 9, 8], 'Z': [6, 3, 16]}

import random
from collections import Counter
from itertools import chain

def random_solution():
    solution = set(random.sample(keylist1, 6))
    coverage = Counter(chain.from_iterable(d2[c] for k in solution for c in d1[k]))
    while len(coverage) < len(cover_range):
        #print(solution, '    ', sorted(coverage.keys()))
        scores = {k: sum(1/coverage[n] for n in frozenset().union(*(d2[c] for c in d1[k])) ) for k in solution}
        #print(scores)
        worst_key = min(solution, key=scores.get)
        solution.remove(worst_key)
        while len(solution) < 6:
            solution.add(random.choice(keylist1))  # in while loop just because the new random key might be one of the 5 keys we already had, if we're unlucky
        coverage = Counter(chain.from_iterable(d2[c] for k in solution for c in d1[k]))
    return solution

How can I efficiently randomly select items from a dictionary that meet my requirements?

1 Answers1