1

I have a dictionary like this:

features_id = {
     id1: [a, b, c, d],
     id2: [c, d],
     id3: [a, e, f, d, g, k],
     ...
}

I have also a list of values I want to create a new dictionary. Something like this:

list_of_values = [a, c]

Goal to achieve:

I want a new dictionary like this:

new_dict = {
    id1: [a, c],
    id2: [c],
    id3: [a],
    ...
}
Pybubb
  • 73
  • 1
  • 9
  • 2
    `new_dict = {k:[x for x in v if x in list_of_values] for k, v in features_id.items()}` – Z Li Nov 07 '22 at 22:28
  • Hi @ZLi thanks for your comment. I've already tried something like the code you have written, but I don't know if it works or not. I have a dictionary with 1M keys, and it takes too much time to compute and too much memory. Has anybody an idea less computationally expensive to write this code? – Pybubb Nov 08 '22 at 00:49
  • 1
    Use a set of values for that "in" operation, not a list. If that's not enough, try stackreview, although this might help too https://stackoverflow.com/questions/3013449/list-comprehension-vs-lambda-filter – Kenny Ostrom Nov 08 '22 at 02:06

3 Answers3

0

for such a large dataset (1M) it might have sence to use pandas and numpy. i'm not sure about the speed in this case but you can try the following:

import pandas as pd
import numpy as np

features_id = {
     'id1': ['a', 'b', 'c', 'd'],
     'id2': ['c', 'd'],
     'id3': ['a', 'e', 'f', 'd', 'g', 'k'],
     'id4': ['e', 'f', 'd', 'g', 'k']}

list_of_values = ['a', 'c']

y = np.array(list_of_values)

def filt(x):
    x = np.array(x)
    return x[np.isin(x,y)].tolist()


pd.Series(features_id).map(filt).to_dict()

>>> out
'''
{'id1': ['a', 'c'], 'id2': ['c'], 'id3': ['a'], 'id4': []}
SergFSM
  • 1,419
  • 1
  • 4
  • 7
  • Thanks for your contribution, but unluckily it takes too long to make all the computations and it's not feasible for my work. – Pybubb Nov 08 '22 at 11:42
0

I write this answer for future users that will have a similar problem.

As written in comments above the solution to this answer is:

set_of_values = set(list_of_values)    
new_dict = {k:[x for x in v if x in set_of_values] for k, v in features_id.items()}

Using a set instead of a list speeds up computations a lot, especially in my case where I have to compare 1M+ dictionary keys, taking a few seconds instead of minutes.

Pybubb
  • 73
  • 1
  • 9
0

For each item of the initial dictionary you have to search for each item of the feature list if the element is contained in the item. If yes, then add it.

At the first add of a key in the new dictionary, you have to create the value, at the others, you have to append it to the existent one.

new_dict = {}
for key, value in features_id.items():
    for val in list_of_values:
        if val in value:
            if key not in new_dict:
                new_dict[key] = [val]
            else:
                new_dict[key].append(val)
albertopasqualetto
  • 87
  • 1
  • 1
  • 11