3

I have the following code that works fine and I was wondering how to implement the same logic using list comprehension.

def get_features(document, feature_space):
    features = {}
    for w in feature_space:
        features[w] = (w in document)
    return features

Also am I going to get any improvements in performance by using a list comprehension?

The thing is that both feature_space and document are relatively big and many iterations will run.

Edit: Sorry for not making it clear at first, both feature_space and document are lists.

  • document is a list of words (a word may exist more than once!)
  • feature_space is a list of labels (features)
Christos Baziotis
  • 5,845
  • 16
  • 59
  • 80

1 Answers1

3

Like this, with a dict comprehension:

def get_features(document, feature_space):
    return {w: (w in document) for w in feature_space}

The features[key] = value expression becomes the key: value part at the start, and the rest of the for loop(s) and any if statements follow in nesting order.

Yes, this will give you a performance boost, because you've now removed all features local name lookups and the dict.__setitem__ calls.

Note that you need to make sure that document is a data structure that has fast membership tests. If it is a list, convert it to a set() first, for example, to ensure that membership tests take O(1) (constant) time, not the O(n) linear time of a list:

def get_features(document, feature_space):
    document = set(document)
    return {w: (w in document) for w in feature_space}

With a set, this is now a O(K) loop instead of a O(KN) loop (where N is the size of document, K the size of feature_space).

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343