Questions tagged [mallet]

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

From Mallet's website:

MALLET includes sophisticated tools for document classification: efficient routines for converting text to "features", a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics.

In addition to classification, MALLET includes tools for sequence tagging for applications such as named-entity extraction from text. Algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields. These methods are implemented in an extensible system for finite state transducers.

In addition to sophisticated Machine Learning applications, MALLET includes routines for transforming text documents into numerical representations that can then be processed efficiently. This process is implemented through a flexible system of "pipes", which handle distinct tasks such as tokenizing strings, removing stopwords, and converting sequences into count vectors.

321 questions
0
votes
0 answers

Mallet ML Library printing out same result for different instances

I was wondering why Mallet Classification Model gives the same output even though my instances are completely different from one another. I have changed the code in CSV2Classify so it only prints out the top 10 labels and their confidence score. I…
Long Le Minh
  • 335
  • 1
  • 2
  • 12
0
votes
1 answer

Mallet SimpleTagger different number of predicates

I was trying the SimpleTagger tutorial provided here. I've run the exact same commands as provided on the page i.e. java -cp "class:lib/mallet-deps.jar" cc.mallet.fst.SimpleTagger --train true --model-file nouncrf sample and java -cp…
iamwhoiam
  • 287
  • 1
  • 6
  • 15
0
votes
1 answer

Function of typeTopicCounts in topic modeling implementation of mallet API

I am trying to understand how the LDA topic model is implemented in mallet API. In the ParallelTopicModel class I can see a 2D int array called typeTopicCounts which is initialized in buildInitialTypeTopicCounts() method through some bitwise…
Sumanta
  • 1
  • 1
0
votes
1 answer

Mallet Document Classification - Reduce Vocabulary Size

I trained a maxent document classification model with Mallet and it turned out to be 130MB which is too large for the instance I wish to run it on. I was wondering if there was a way to potentially reduce the vocabulary size of the model such that…
user1893354
  • 5,778
  • 12
  • 46
  • 83
0
votes
1 answer

How does mallet set its default hyperparameters for LDA i.e. alpha and beta?

I have one question to ask about Mallet topic modelling. How does it set its default hyperparameters for LDA i.e. alpha and beta?
0
votes
1 answer

Training and Testing data structure : Mallet Classifier

I am trying to use Mallet- Naive-Bayes classifier API. I have modeled the training set and Test set as follows Training : [ID] [Label] [Data] Testing: [ID] [ ] [Data] Below is the code which I have used: public static void main(String[]…
Betafish
  • 1,212
  • 3
  • 20
  • 45
0
votes
1 answer

How to get the cosine similarity between two documents in MALLET?

I've an LDA topic model trained using MALLET but I want compute the cosine similarity between two documents to get the similarity but I'm not sure which file that MALLET outputs do I compute the cosine of. My cosine similarity function is working…
higz555
  • 115
  • 8
0
votes
1 answer

error during importing mallet from tethne in python

In my python project I've been added : from tethne.model.corpus import mallet but my problem is that when I'm running my project I see these errors in my pycharm console: Traceback (most recent call last): File…
brelian
  • 403
  • 2
  • 15
  • 31
0
votes
1 answer

Change order of columns in topic distribution file in MALLET

MALLET generates a tab-separated file with the topic distribution of each document by using the --output-doc-topics parameter while training the topic model. It kind of looks like this: doc# filename topic# weight 0 …
phly
  • 185
  • 1
  • 12
0
votes
1 answer

How to read a topic model trained in command line into a Java class?

So I've a trained model that was created through command line with MALLET. I want to, somehow, import this trained model into a Java class. I looked through MALLET's API documentation and came across their ParallelTopicModel class but couldn't find…
higz555
  • 115
  • 8
0
votes
1 answer

Mallet is not recognized as internal or external command

I am using windows 7. I installed Mallet and it works perfectly when I go to the Mallet directory. However, I am using some python software that calls it (https://github.com/uwgraphics/VEP_TMScripts) and I get the above referenced error. How do I…
tom
  • 315
  • 1
  • 3
  • 10
0
votes
1 answer

How to use array of doubles as feature vector in Mallet

From what I've seen in documentation and various examples, typical worfklow with data in Mallet requires you to work with feature list that you usually obtain by passing your data through "pipes" while iterating over them with some sort of iterator.…
dkaras
  • 195
  • 2
  • 12
0
votes
1 answer

Text Classification/Document Classification with Sequence Tagging with Mallet

I have documents arranged in folders as classes called categories. For a new input (such as a question asked), I have to identify its category. What is be the best way to do this using MALLET? I've gone through multiple articles about this, but…
0
votes
1 answer

Unable to understand the HLDA Output in MALLET

Below is a snippet of my code: HierarchicalLDA hlda = new HierarchicalLDA(); hlda.initialize(instances, instances, 5, new Randoms()); hlda.estimate(1000); hlda.printState(new PrintWriter(new File("Data.txt"))); I am unable to understand the meaning…
0
votes
1 answer

how to run topic model on 20000 documents at once?

I have 20000 news documents to run topic modeling on it: I want to see the topic dynamics and evolution from the documents. I tried to use the following batch script with Topic modeling by mallet but not work. #!/bin/bash for filename in…
Jason
  • 47
  • 2
  • 11