I'm fairly new to sklearn's DictVectorizer, and am trying to create a function where DictVectorizer will output feature names from a list of bigrams that I have used to form a from a feature dictionary. The input to my function is a string, and the function should return a list consisting of a formed into dictionaries (something like this).
def features (str) -> List[Dict[Text, Union[Text, int]]]:
# my feature dictionary should have 'bigram' as the key, and the values will be the bigrams themselves. your feature dict needs to have "bigram" as a key
# bigram: a form of "w[i]-w[i+1]"
# This is my bigram list (as structured above)
bigrams: List[Dict[Text, Union[Text, int]]] = []
# here is my code:
bigrams = {'bigram':i for j in sentence for i in zip(j.split(" ").
[:-1], j.split(" ")[1:])}
return bigrams
vect = DictVectorizer(sparse=False)
text = str()
feature_catalog = features(text)
vect.fit(feature_catalog)
print(sorted(vectorizer.get_feature_names_out()))
Everything works fine until the code advances to the DictVectorizer blocks (hidden in the class itself). This is what I get:
AttributeError Traceback (most recent call last)
/var/folders/pl/k80fpf9s4f9_3rp8hnpw5x0m0000gq/T/ipykernel_3804/266218402.py in <module>
22 features = get_feature(text)
23
---> 24 vectorizer.fit(features)
25
26 print(sorted(vectorizer.get_feature_names()))
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/sklearn/feature_extraction/_dict_vectorizer.py in fit(self, X, y)
159
160 for x in X:
--> 161 for f, v in x.items():
162 if isinstance(v, str):
163 feature_name = "%s%s%s" % (f, self.separator, v)
AttributeError: 'str' object has no attribute 'items'
Any ideas? This ultimately going to be used as part of a larger processsing effort on a corpus.