Questions tagged [mallet]

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

From Mallet's website:

MALLET includes sophisticated tools for document classification: efficient routines for converting text to "features", a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics.

In addition to classification, MALLET includes tools for sequence tagging for applications such as named-entity extraction from text. Algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields. These methods are implemented in an extensible system for finite state transducers.

In addition to sophisticated Machine Learning applications, MALLET includes routines for transforming text documents into numerical representations that can then be processed efficiently. This process is implemented through a flexible system of "pipes", which handle distinct tasks such as tokenizing strings, removing stopwords, and converting sequences into count vectors.

321 questions
0
votes
0 answers

Memory leak in Mallet Alphabet?

I'm using Mallet 2.0.8 for topic modelling. The application loops over a set of documents and calculates a topic model for each of the documents (no information is to be shared among the passes or to be aggregated afterwards). Each pass constructs a…
user2043553
  • 161
  • 6
0
votes
1 answer

MALLET - How to pass the csv file which contains word count to näive bayes in mallet?

I have created the CSV file which contains label name and word frequency. e.g. 0, 4.0, 0.0, 0.0, 1.0, 0.0 0, 0.0, 1.0, 2.0, 0.0, 0.0 1, 1.0, 0.0, 0.0, 0.0, 3.0 Where the index zero represents the label (0 and 1) My question is,…
Rajani
  • 25
  • 6
0
votes
1 answer

What is the meaning of negative empirical Likelihood in HLDA Mallet?

I am using mallet to train a hierarchical LDA model. However when calculating the empirical Likelihood using: double empiricalLikelihood = hlda.empiricalLikelihood(1000, testing); I am getting a negative number. How can I interpret the meaning of…
Raniem
  • 91
  • 3
0
votes
1 answer

Why Mallet text classification output the same value 1.0 for all test files?

I am learning Mallet text classification command lines. The output values for estimating differrent classes are all the same 1.0. I do not know where I am incorrect. Can you help? mallet version: E:\Mallet\mallet-2.0.8RC3 //there is a txt file about…
Dylan
  • 1,183
  • 4
  • 13
  • 26
0
votes
0 answers

Does HLDA in Mallet return Word-Topic Distribution?

I am trying to generate a taxonomy of extracted terminology using topic models. Therefore, I had to use Hierarchical Latent Dirichlet allocation. However, after getting the topics tree, I would like to annotate topics but I am unable to produce the…
Raniem
  • 91
  • 3
0
votes
1 answer

Mallet HierarchicalLDATUI throws NullPointerException for certain files

In the past few days, I have started using Mallet. I am specifically interested in running a hierarchical topic model, like HLDA or HPAM. When importing the sample data files and running them using the cc.mallet.topics.tui.HierarchicalLDATUI class,…
MrDeal
  • 373
  • 2
  • 11
0
votes
1 answer

Mallet: Alphabets don't match, exception when making model again in one program

I have explored mallet and it is working good. What I am trying to do is making a model twice in a program scope and facing exception. My program code is as: List commands = new ArrayList(); commands.add("--input input.mallet…
Hammad Hassan
  • 1,192
  • 17
  • 29
0
votes
1 answer

How to add word-level features to Mallet SimpleTagger?

I have been going through this blog post which contains a SimpleTagger example. It says: Given an input file "sample" as follows: CAPITAL Bill noun slept non-noun here non-noun where all but the last token on each line is a binary…
Dawny33
  • 10,543
  • 21
  • 82
  • 134
0
votes
1 answer

Different topic distributions for the same data with mallet topic modeling

I am using Mallet topic modeling and I have trained a model. Right after the training, I print the topic distribution for one of the documents of the training set and save it. Then, I try the same document as the test set and pass it through the…
user1419243
  • 1,655
  • 3
  • 19
  • 33
0
votes
1 answer

ParallellTopicModel - Thread option changes result significantly

I am currently using the ParallelTopicModel for topic modeling, but I've encountered some strange behavior. When I set different number of threads for the model, I get different results which should not happen if I'm right. The implementation we've…
bunzJ
  • 3
  • 2
0
votes
1 answer

How to use mallet for topic modelling API

Is there anyone here have successfully using mallet API for topic modelling. i'm find it difficult to understand, even until know i don't know ho to import my txt as the data. do you guys know any good source to learn about the code? i don't find…
0
votes
1 answer

getting instances and topic sequences of all document in mallet

I'm working topic modeling with mallet library. My data set is in filePath path and csvIterator seems can read data because model.getData() has about 27000 rows that is equal to my dataset. I wrote a loop that print instances and topic sequences of…
NASRIN
  • 475
  • 7
  • 22
0
votes
1 answer

How to get mallet to load all tokens from a line without a label?

I'm trying to perform topic modeling on a dataset that's in a whitespace delimited file, with no label. I can't get mallet to load all the tokens. I'm using version 2.0.8 on linux and mac. As a test for the issue, I created a file with the one…
Adair
  • 1,697
  • 18
  • 22
0
votes
1 answer

Pick a Topic Model

I am new to topic modeling and kind of confused. I have run MALLET various times with different values for the number of topics. So how do I know which one to choose for further analysis? I know that there are papers out there dealing with…
Pixie
  • 17
0
votes
0 answers

Pachinko Modeling in Mallet

I am experimenting with the Pachinko topic model in Mallet, and am having trouble getting it working. When it prints out the topics at each update, they are all the same. This occurs when I use both the default alpha and beta values, and when I use…
Harry Baker
  • 93
  • 2
  • 9