I want to use numpy to speed up my computation where in I have a dictionary and I want to create a vector from it based on the presence of words as keys in the dictionary. I currently do this - a dummy example is provided for better understanding, the actual data is much larger:
self.bigram_freq = {"a cat":3, "man child"2, "pokemon team":4}
sentences = ['a boy ran over a cat with his bike yesterday afternoon']
for sentence in sentences:
feature_vector = []
# generate pairs from this sentence and see if in bigram_freq
bigram_pairs = self.retrieve_pairs(sentence)
dummy_dict = dict.fromkeys(self.bigram_freq, 0)
for pair in bigram_pairs:
if pair in self.bigram_freq:
dummy_dict[pair] +=1
feature_vector = list(dict(sorted(dummy_dict.items(), key=lambda item: item[0])).values())
outputVector.append(feature_vector)
But due to the two loops, its a lot slower. I was wondering if this could be sped up using numpy and np.where. I was thinking of creating an array of np.zeros and then populating a specific index of the ndarray when the corresponding token (a pair from bigram_pairs) is present but I am unable to do so. Any help would be appreciated.