Questions tagged [mallet]

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

From Mallet's website:

MALLET includes sophisticated tools for document classification: efficient routines for converting text to "features", a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics.

In addition to classification, MALLET includes tools for sequence tagging for applications such as named-entity extraction from text. Algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields. These methods are implemented in an extensible system for finite state transducers.

In addition to sophisticated Machine Learning applications, MALLET includes routines for transforming text documents into numerical representations that can then be processed efficiently. This process is implemented through a flexible system of "pipes", which handle distinct tasks such as tokenizing strings, removing stopwords, and converting sequences into count vectors.

321 questions
0
votes
2 answers

Python Gensim Mallet

I am trying to apply LDA for topic modeling using the Mallet wrapper of Gensim on Python. The code that I am running is as follows: MALLET_PATH = 'C:/mallet-2.0.8/bin/mallet' lda_mallet = gensim.models.wrappers.LdaMallet(mallet_path=MALLET_PATH,…
0
votes
1 answer

Which version of the Java JDK should I be using with MALLET?

I regularly use MALLET for topic modeling in the classes that I teach. Running MALLET requires users to have the Java Development Kit installed. I currently have JDK 8 update 241 installed on my main computer, and I know that MALLET works properly…
Brian Croxall
  • 249
  • 2
  • 8
0
votes
1 answer

How fix this error: returned non-zero exit status 1 in Mallet?

Please help me with the following error. I tried a lot to fix it but with no help. The code: MALLET_PATH = './Mallet/bin/mallet' def topic_model_coherence_generator(corpus, texts, dictionary, start_topic_count=2, end_topic_count=10,…
B612
  • 53
  • 6
0
votes
1 answer

Python Mallet LDA Errno 2 No such file or directory

I saved an LDAWallet model: First I did the train : mallet_path = 'mallet-2.0.8/bin/mallet' ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, id2word=id2word, num_topics=14) And then I saved the model using the save…
0
votes
0 answers

Return nonzero for LdaMallet

My coworker and I have the exact same code, using the same libraries, but yet his code works and mine doesn't. We've gotten stuck trying to figure out what is wrong. Any help would be greatly appreciated. The code and error are below. Code: import…
AMS
  • 1
0
votes
1 answer

Java Mallet LDA keyword distributions

I have used Java-Mallet API for topic modelling with LDA. The API produce following results: topic : keyword1 (count), keyword2 (count) For example topic 0 : file (12423), test (3123) ... topic 1 : class (2415), test (314) ... Is it right that topic…
0
votes
1 answer

Gensim Mallet: Output does not have terms for few topics

Below is the output that I get using Gensim Mallet wrapper. From this SO link I understood that LL/token means "model's log-liklihood divided by the total number of tokens". 1) However, for few topics like (1,8,11 etc.) I do not see any terms at…
Hackerds
  • 1,195
  • 2
  • 16
  • 34
0
votes
0 answers

Topic modeling in LdaMallet

I wrote the following code, But the following error shows, Please guide me. from gensim.models.wrappers import LdaMallet import os os.environ.update({'MALLET_HOME':r'C:/mallet'}) mallet_path = 'C:/mallet/bin/mallet' ldamallet =…
MeisaM
  • 1
  • 1
0
votes
1 answer

Topic Modeling with Mallet - topic keys output parameter

I have a follow-up question to the one asked here: Mallet topic modeling - topic keys output parameter I hope I can still get a more detailed explanation of this subject because I have trouble understanding these numbers in the output files. What…
0
votes
0 answers

How to predict action by processing multiple free texts in java

I have a multi-column data set as follows Id Summary Component Description Labels Action id1 free-text-11 free-text-12 free-text-13 label1, label2 action1 id2 free-text-11 free-text-22 …
Anindya Chatterjee
  • 5,824
  • 13
  • 58
  • 82
0
votes
0 answers

Mallet: "Error occurred during initialization of VM. Could not reserve enough space for 3145728KB object heap?

java.lang.OutOfMemoryError: Java heap space is resolved by increasing the Xmx3G, but now I am getting "Error occurred during initialization of VM. Could not reserve enough space for object heap. What should I do? I am trying to execute…
0
votes
1 answer

Lda Mallet returned non-zero exit status 1

I am trying to code for a LDA Mallet Model...I ran this a couple months ago and it ran fine but it is no longer. There have been other posts on the same subject but the solutions have not yet helped me. Can anyone figure out what is wrong in my code…
eyama
  • 303
  • 2
  • 6
0
votes
0 answers

MALLET: Topic inference when training data was pruned

I am trying to use MALLET to first train a topic model and then use the inferencer from that model on a set of new documents. From two other threads here and on the MALLET mailing list, I've gathered that is is important to ensure compatibility of…
0
votes
1 answer

How do I pass a file path containing spaces to the Gensim LDA Mallet wrapper?

I am attempting to use Gensim's Mallet wrapper. When I run the following code: import os import gensim os.environ.update({ 'MALLET_HOME': r":C\Users\me\OneDrive - My Company\Documents\Projects\Current\mallet-2.0.8" }) lda_mallet…
Clade
  • 966
  • 1
  • 6
  • 14
0
votes
1 answer

How can I create mallet instance from feature value object?

I have a json object like {"f1": 2.1, "f2": 3.2, "f3": 1234.12, "label": "GOOD"} I want to convert it into mallet instance