Questions tagged [mallet]

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

From Mallet's website:

MALLET includes sophisticated tools for document classification: efficient routines for converting text to "features", a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics.

In addition to classification, MALLET includes tools for sequence tagging for applications such as named-entity extraction from text. Algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields. These methods are implemented in an extensible system for finite state transducers.

In addition to sophisticated Machine Learning applications, MALLET includes routines for transforming text documents into numerical representations that can then be processed efficiently. This process is implemented through a flexible system of "pipes", which handle distinct tasks such as tokenizing strings, removing stopwords, and converting sequences into count vectors.

321 questions
1
vote
1 answer

OutOfMemoryError with Mallet CRF classifier

The classifier frequently fails with OutOfMemoryError. Please suggest. We have UIMA pipeline which invokes 5 model jars(based on mallet CRF) around 30MB each. -Xms is set to 2G and -Xmx is set to 4G. Is there any guidelines/bench marking on setting…
Tilak
  • 323
  • 1
  • 5
  • 18
1
vote
1 answer

empty topics in Mallet LDA topic modeling

When I'm running Mallet LDA with higher number of topics ( eg. T > 300) I get topics with empty topic words (doesn't have a single topic word). Why is that happening? Is this a bug in Mallet? I'm using mallet 2.0.7 on a ubuntu 14.04…
samsamara
  • 4,630
  • 7
  • 36
  • 66
1
vote
1 answer

how to extract topical key phrases using mallet

I have imported the file in mallet, now I want to model topic from the imported data and store them in a text file, from where I will be able to read those topics. Can anyone help in writing the commands for topic extraction, as I typed command…
Kanwal
  • 75
  • 1
  • 2
  • 10
1
vote
1 answer

Convert Java serialization data into a readable file ? linux

I am learning mallet and I am trying out the example. So, I ran this command bin/mallet import-dir --input sample-data/web/* --output web.mallet from the link http://mallet.cs.umass.edu/import.php The output I got is a file named web.mallet but it…
rohitsakala
  • 379
  • 3
  • 15
1
vote
1 answer

Mallet topic modelling, labelling topics

I have a corpus of articles in a single document and I am applying the topic modelling algorithm from MALLET in order to later use a search function that will allow the user to search for relevant articles to his input. The algorithm I'm using is…
deadpixels
  • 769
  • 1
  • 12
  • 21
1
vote
1 answer

Text Classification using MALLET

I'm new to using Mallet. I usually use WEKA for classification, and now I'm trying to use Mallet for text classification. In Weka, there are attributes (such as word length or top-n word occurrence) that we choose ourselves and make the .arff file.…
kaylak
  • 11
  • 4
1
vote
0 answers

Mallelt training dataset with any class specified in Mallet API

I have a dataset with 15000 words with comma separated values , I want train mallet so that whenever we do a tag extraction further we should get result keeping trained data set in mallet . I need some sample piece of code to train my dta using…
1
vote
2 answers

MALLET Topic Modeling: Inconsistent Estimations

I'm using MALLET to train a ParallelTopicModel. After training, I get a TopicInferencer, take a sentence, run it through the inferencer 15 times, and check the results. I'm finding that for some topics, the estimation is different each time and not…
kk415kk
  • 1,227
  • 1
  • 14
  • 30
1
vote
0 answers

Gensim LdaMallet division error

I'm trying to replicate the tutorial for the Mallet wrapper in gensim. http://radimrehurek.com/2014/03/tutorial-on-mallet-in-python/ When I fit the model with model = models.LdaMallet(mallet_path, corpus, num_topics=10, id2word=corpus.dictionary) I…
Artturi Björk
  • 3,643
  • 6
  • 27
  • 35
1
vote
0 answers

Which iterator should I use to create instances from feature value pairs (Mallet api)?

I am tring to run LDA to generate some topics from txt files as the following one: Document1 label1 forest=3.4 tree=5 wood=2.85 hammer=1 colour=1 leaf=1.5 Document2 label2 forest=10 tree=5 wood=2.75 hammer=1 colour=4 leaf=1 Document3 label3…
1
vote
1 answer

Using Mallet for Naive Bayes classification: How and where are Alphabets set up?

I am trying to use the MALLET machine-learning library in a project for word sense disambiguation. My feature vectors consist of a fixed-size token window of x tokens to the left and right of the target token. The MALLET training instances are…
martin_wun
  • 1,599
  • 1
  • 15
  • 33
1
vote
0 answers

How to get SVM in MALLET

I have been using MALLET for sometime now and I want to train the data using the SVM classifier. Is there a way I can get SVM on MALLET. I followed the instructions at SVM on MALLET but it didn't help much. Thank you in advance.
Denzil
  • 326
  • 3
  • 15
1
vote
1 answer

In MALLET Java API, why can't the Input2CharSequence pipe feed into the CharSequenceLowercase() pipe?

When I try to use these pipes successively, I get the error: Exception in thread "main" java.lang.IllegalArgumentException: CharSequenceLowercase expects a String, found a class java.lang.StringBuffer I don't see any pipes available in MALLET to fix…
pjshap
  • 72
  • 11
1
vote
1 answer

What is estimate function in topic modeling using mallet library

I'm new on topic modeling and I'm trying to use Mallet library but I have a question. I'm using Simple parallel threaded implementation of LDA to find topics for some instances. My question is what is estimate function in ParallelTopicModel? I have…
Jimmysnn
  • 583
  • 4
  • 8
  • 30
1
vote
1 answer

R mallet error in jcall:java.lang.NoSuchMethodException: No suitable method for the given parameters

I am using mallet in R, It was working fine until I install devtools. After that I start getting following error which I never got. Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.NoSuchMethodException: No…
add-semi-colons
  • 18,094
  • 55
  • 145
  • 232