-2

hello my programmer friends... i'm doing my first NLP project that counts and shows 5 documents TFIDF. here's part of the code:

def IDF(corpus , unique_words):
    idf_dict = {}
    N = len(corpus)
    for i in unique_words:
        count = 0
        for sen in corpus:
            if i in sen.split():
                count = count+1
            idf_dict[i] = (math.log((1 + N) / (count+1))) + 1
    return idf_dict

def fit(whole_data):
    unique_words = set()
    if isinstance(whole_data, (list,)):
        for x in whole_data:
            for y in x.split():
                if len(y)<2:
                    continue
                unique_words.add(y)
            unique_words = sorted(list(unique_words))
            vocab = {j:i for i,j in enumerate(unique_words)}
    Idf_values_of_all_unique_words = IDF(whole_data,unique_words)
    return vocab, Idf_values_of_all_unique_words
vocabulary, idf_of_vocabulary = fit(corpus)

The word IDF in line 22 gives me a NameError. is it about positioning?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Parsa
  • 1
  • 3
  • 1
    It would be nice if you show the line number on the code. There is no IDF on line 11 now. – Park Jul 18 '22 at 06:02
  • do you think that `vocab` and `unique_words` are **always** defined inside `fit`? or that `corpus` exists when you call `fit` ? – DeepSpace Jul 18 '22 at 06:10
  • Actually it was all about putting the Function in the right place... i dropped "IDF Function" inside "fit Function" and it works fine. thanks everyone. – Parsa Jul 18 '22 at 13:19

1 Answers1

0
def fit(whole_data):
    def IDF(whole_data, unique_words):
        idf_dict = {}
        N = len(whole_data)
        for i in unique_words:
            count = 0
            for sen in whole_data:
                if i in sen.split():
                    count = count+1
                idf_dict[i] = (math.log((1 + N) / (count+1))) + 1
        return idf_dict

    unique_words = set()

    if isinstance(whole_data, (list,)):
        for x in whole_data:
            for y in x.split():
                if len(y) < 2:
                    continue
                unique_words.add(y)
        unique_words = sorted(list(unique_words))
        vocab = {j: i for i, j in enumerate(unique_words)}

        Idf_values_of_all_unique_words = IDF(whole_data, unique_words)
    return vocab, Idf_values_of_all_unique_words

vocabulary, idf_of_vocabulary = fit(corpus)

just like that!

Parsa
  • 1
  • 3
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jul 19 '22 at 18:28