0

I'm new to Python and I'm having trouble with a function that has been discussed many times already, most recently here: Extract Topic Scores for Documents LDA Gensim Python problem of sorting tuples

I've done what's suggested in the answer:

def format_topics_sentences(ldamodel=lda_model, corpus=corpus, texts=data1):

# Init output
    final = []
    # Get main topic in each document
    for i, row_list in enumerate(ldamodel[corpus]):
        row = row_list[0] if ldamodel.per_word_topics else row_list
        row = sorted(row, key=lambda x: (x[1]),reverse=True)
        # Get the Dominant topic, Perc Contribution and Keywords for each document
        for j, (topic_num, prop_topic) in enumerate(row):
            if j == 0:  # => dominant topic
                wp = ldamodel.show_topic(topic_num)
                topic_keywords = ", ".join([word for word, prop in wp])
                lists1 = int(topic_num), round(prop_topic,4),topic_keywords
                final.append(lists1)
            else:
                break
    sent_topics_df = pd.DataFrame(final, columns=['Dominant_Topic', 'Perc_Contribution', 'Topic_Keywords'])
    contents = pd.Series(texts)
    sent_topics_df = pd.concat([sent_topics_df,contents], axis=1)


    return(sent_topics_df)


df_topic_sents_keywords = format_topics_sentences(ldamodel=optimal_model, corpus=corpus, texts=texts)

# Format
df_dominant_topic = df_topic_sents_keywords.reset_index()
df_dominant_topic.columns = ['Document_No', 'Dominant_Topic', 'Topic_Perc_Contrib', 'Keywords', 'Text']

# Show
df_dominant_topic.head(10)

but I get AttributeError: 'LdaMallet' object has no attribute 'per_word_topics'

Can someone suggest a correction to the code?

Fedor
  • 17,146
  • 13
  • 40
  • 131
nazeli
  • 1
  • 1
  • It looks like that code is for the results of Gensim's native LDA modelling (i.e. `LdaModel`). If you want to use Mallet for LDA modelling, and Gensim as a wrapper around it (which is what gives you the `LdaMallet` object), then have a look at the documentation for the classes: you'll need to replace the LdaModel functions in this code with there LdaMallet equivalents. See https://radimrehurek.com/gensim/models/ldamodel.html, https://radimrehurek.com/gensim_3.8.3/models/wrappers/ldamallet.html – slothrop Jun 15 '23 at 10:52
  • It's worth noting that the Mallet wrapper has been removed from more recent versions of Gensim. So for a sustainable approach, consider either using native Gensim models, or using a different wrapper like https://github.com/maria-antoniak/little-mallet-wrapper. See https://stackoverflow.com/questions/62581874/gensim-ldamallet-vs-ldamodel, https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4#15-removed-third-party-wrappers – slothrop Jun 15 '23 at 10:55
  • I tried at first with the original function used in the source article https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/ (Step 18) which is meant for LdaMallet but the problem there is the append which is deprecated and which I can't seem to be able to replace effectively. Thanks for the help!! – nazeli Jun 15 '23 at 20:24

0 Answers0