0

I've recently found Textacy and as i go through the API reference guide I'm running into an error for the Vectorizer. If i add any options from the API reference I get a TypeError: unexpected keyword argument. I get this error for other options in addition to weighting.

I installed textacy using pip and I'm using Python3 on Ubuntu. Any help is appreciated. Thanks!

vectorizer = textacy.vsm.Vectorizer(weighting='tfidf')

TypeError: __init__() got an unexpected keyword argument 'weighting'
RKB
  • 73
  • 1
  • 11

1 Answers1

2

Ran into the same problem. The API documentation does not reflect the current Vectorizer keyword arguments. The Vectorizer now provides different keyword arguments to allow more control over how TF*IDF is applied.

vectorizer = textacy.Vectorizer(tf_type='linear', apply_idf=True, idf_type='smooth')

tf_type applies standard term frequency (TF), apply_idf=True applies the inverse document frequency (IDF). From the repo comments, idf_type='smooth' adds one to each document frequency in order to avoid zero divisions.

To see more information about the options check the comment at line 182 in the repository here: https://github.com/chartbeat-labs/textacy/blob/master/textacy/vsm/vectorizers.py

apavel
  • 36
  • 2