Questions tagged [mallet]

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

From Mallet's website:

MALLET includes sophisticated tools for document classification: efficient routines for converting text to "features", a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics.

In addition to classification, MALLET includes tools for sequence tagging for applications such as named-entity extraction from text. Algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields. These methods are implemented in an extensible system for finite state transducers.

In addition to sophisticated Machine Learning applications, MALLET includes routines for transforming text documents into numerical representations that can then be processed efficiently. This process is implemented through a flexible system of "pipes", which handle distinct tasks such as tokenizing strings, removing stopwords, and converting sequences into count vectors.

321 questions
0
votes
1 answer

Mallet stops working for large data sets?

I am trying to use LDA Mallet to assign my tweets to topics, and it works perfectly well when I feed it with up to 500,000 tweets, but it seems to stop working when I use my whole data set, which is about 2,500,000 tweets. Do you have any solutions…
Mike Sal
  • 197
  • 1
  • 4
  • 13
0
votes
1 answer

Scala future not writing to Resource directory unless terminated

I have an Akka server who is asking the mallet file (some output) from an actor. However in mallet actor code, several steps are done. In which files are taken, modified, new files are created and saved in resource directory couple of times. I need…
danD
  • 666
  • 1
  • 7
  • 29
0
votes
0 answers

creating file in scala code is not usable unless terminating the code

I have a Scala code in which i am creating a new file in resource directory. And then doing some work around with that new file. (in my case splitting the file) logger.debug("Run training process...") InferTopics.main(("--input " + tmpDir +…
danD
  • 666
  • 1
  • 7
  • 29
0
votes
3 answers

Saved Gensim LdaMallet model not working in different console

I am training a ldamallet model in python and saving it. I am also saving training dictionary that I can use to create corpus for unseen documents later. If I perform every action (i.e. train a model, save trained model, load saved model, infer…
0
votes
1 answer

Mallet Api in scala Akka throwing error -Request timeout encountered for request [GET /mallet Empty]

Please pardon me if the question sound naive as i am pretty new in akka. I am trying to use the mallet api in scala akka in rest API But getting error Request timeout encountered below is the snapshot of my…
danD
  • 666
  • 1
  • 7
  • 29
0
votes
1 answer

subprocess.CalledProcessError when trying to run Mallet with Gensim

I'm trying to do topic modeling with Gensim and Mallet (link). When I locate the mallet_path and then try to assign it to gensim, I get the error subprocess.CalledProcessError : returned non-zero exit status 1 And I get prompted to update Java…
Sennheiser
  • 13
  • 5
0
votes
1 answer

make Mallet topic-modeling stable

I'm using the mallet topic-modeling tool and have some difficulties to make it stable (the topics that I get are not seemed very logic). I worked with your tutorial and that one:…
Daniel Juravski
  • 181
  • 1
  • 2
  • 12
0
votes
1 answer

mallet topic modling: How to deactive lowercase?

I'm conducting an topic modeling experiment with Mallet on german texts. Since german nouns begin with uppercase, I want to keep this feature. Does anyone know how to deactivate lowercasing?
eric24629
  • 1
  • 2
0
votes
1 answer

Sequence Tagging in batch with Mallet cmd prompt

I have tested the SimpleTagger for Sequence Tagging on mallet's cmd prompt interface. I would now like to train over many files and run tests in batches. Is it also possible to do this on mallet's command prompt? I want to get some hint on the…
spaniard81
  • 61
  • 1
  • 8
0
votes
1 answer

Convert Decision tree from text 2 visual

I have a decision tree output in a 'text' format which is very hard to read and interpret. There are ton of pipes and indentation to follow the tree/nodes/leaf. I was wondering if there are tools out there where I can feed in a decision tree like…
sharp
  • 2,140
  • 9
  • 43
  • 80
0
votes
2 answers

how to predict topics for a batch of documents with mallet

I am using mallet from a scala project. After training the topic models and got the inferencer file, I tried to assign topics to new texts. The problem is I got different results with different calling methods. Here are the things I tried: creating…
yang
  • 498
  • 5
  • 22
0
votes
0 answers

Mallet TokenSequenceRemoveStopwords trouble reading file

I´m trying to use Mallet for Topic Modelling. So here´s my code: { ArrayList pipeList = new ArrayList(); // Lowercase everything pipeList.add(new CharSequenceLowercase()); // Unicode letters, underscore, and hashtag …
Amanda
  • 1
0
votes
1 answer

Java Exception during topic training in Mallet

I have the following mallet command (for v 2.0.8 (May 3,2016)) under Linux 2.6.32-696.18.7.el6.x86_6 and Java SE Runtime Environment (build 1.7.0_05-b06): bin/mallet train-topics --input html/$1/topic --num-topics $1 \ --output-doc-topics result …
0
votes
1 answer

Mallet with Gensim: file-not-found

I try to get LDAMallet in gensim working, but get the following error 'C:\...\AppData\Local\Temp\eb09f5_state.mallet.gz' not found The code ldamallet = gensim.models.\ wrappers.LdaMallet(mallet_path, corpus=corpus, …
user9165100
  • 371
  • 3
  • 11
0
votes
1 answer

Usage of indicator functions as features in Sequential Models

I am currently using Mallet for training a sequential model using CRF. I have understood how to provide features (that solely depend on input sequence) to the mallet package. Based on my understanding, in mallet, we have to compute all the values of…
Nakamura
  • 179
  • 1
  • 9