1

I am trying to fit the Doc2Vec method in a dataframe which the first column has the texts, and the second one the label (author). I have found this article https://towardsdatascience.com/multi-class-text-classification-with-doc2vec-logistic-regression-9da9947b43f4, which is really helpful. However, I am stuck at how to build a model

import tqdm
cores = multiprocessing.cpu_count()
model_dbow = Doc2Vec(dm=0, vector_size=300, negative=5, hs=0, min_count=2, sample=0, workers=cores)
model_dbow.build_vocab([x for x in tqdm(train_tagged.values)])

TypeError: 'module' object is not callable

Could you please help me how to overcome this issue?

Before that I have also this code

train, test = train_test_split(df, test_size=0.3, random_state=42)
import nltk
from nltk.corpus import stopwords
def tokenize_text(text):
    tokens = []
    for sent in nltk.sent_tokenize(text):
        for word in nltk.word_tokenize(sent):
            if len(word) < 2:
                continue
            tokens.append(word.lower())
    return tokens
train_tagged = train.apply(
    lambda r: TaggedDocument(words=tokenize_text(r['text']), tags=[r.author]), axis=1)
test_tagged = test.apply(
    lambda r: TaggedDocument(words=tokenize_text(r['text']), tags=[r.author]), axis=1)

Edit: if I remove tqdm from the code is working, but I am not sure is this is accepted. tqdm as I know is a package for Python that enables you to instantly create progress bars and estimate TTC (Time To Completion) for your functions and loops, so I mean If I remove it, there is no problem with the output. Right?

Edit2: See also this question My Doc2Vec code, after many loops of training, isn't giving good results. What might be wrong? to improve the code of the tutorial. Thanks again @gojomo

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • 1
    Note separately that the online example you're copying, at , uses an inadvisably overcomplicated & error-prone loop to call `Doc2Vec.train()` multiple times & self-manage `alpha` (poorly). In fact, if you copy its loop exactly for your model (which starts with default `alpha=0.025`), you'll wind up decrementing `alpha` into nonsensical negative values. See https://stackoverflow.com/a/62801053/130288 for more details. – gojomo Aug 09 '21 at 21:15
  • 1
    @gojomo Thank you so much! You are absolutely right! –  Aug 11 '21 at 09:11

2 Answers2

1

You are importing tqdm module and not the actual class.

replace import tqdm

with from tqdm import tqdm

Elad Cohen
  • 453
  • 3
  • 16
  • okay I will do it, If I remove the tqdm is there any problem with the output? See my updated question please. –  Aug 09 '21 at 09:01
  • 1
    `tqdm` is just a progression visualization library, it will not change the logic of your code. – Elad Cohen Aug 09 '21 at 09:03
0

i found this

im not sure about Doc2Vec

but this error in python is about module name

This error statement TypeError: 'module' object is not callable is raised as you are being confused about the Class name and Module name. The problem is in the import line . You are importing a module, not a class. This happend because the module name and class name have the same name .

If you have a class MyClass in a file called MyClass.py , then you should write:

from MyClass import MyClass

src : http://net-informations.com/python/iq/typeerror.htm

Dharman
  • 30,962
  • 25
  • 85
  • 135
H Sa
  • 128
  • 9