Populate dictionary from list in loop

Question

I have the following code that works fine and I was wondering how to implement the same logic using list comprehension.

def get_features(document, feature_space):
    features = {}
    for w in feature_space:
        features[w] = (w in document)
    return features

Also am I going to get any improvements in performance by using a list comprehension?

The thing is that both feature_space and document are relatively big and many iterations will run.

Edit: Sorry for not making it clear at first, both feature_space and document are lists.

document is a list of words (a word may exist more than once!)
feature_space is a list of labels (features)

What is `document`? If not a set or dictionary, make it one. — Martijn Pieters, Jul 21 '16 at 16:40

Martijn Pieters · Accepted Answer · 2016-07-21T16:56:24.727

Like this, with a dict comprehension:

def get_features(document, feature_space):
    return {w: (w in document) for w in feature_space}

The features[key] = value expression becomes the key: value part at the start, and the rest of the for loop(s) and any if statements follow in nesting order.

Yes, this will give you a performance boost, because you've now removed all features local name lookups and the dict.__setitem__ calls.

Note that you need to make sure that document is a data structure that has fast membership tests. If it is a list, convert it to a set() first, for example, to ensure that membership tests take O(1) (constant) time, not the O(n) linear time of a list:

def get_features(document, feature_space):
    document = set(document)
    return {w: (w in document) for w in feature_space}

With a set, this is now a O(K) loop instead of a O(KN) loop (where N is the size of document, K the size of feature_space).

Populate dictionary from list in loop

1 Answers1