Questions tagged [mallet]

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

From Mallet's website:

MALLET includes sophisticated tools for document classification: efficient routines for converting text to "features", a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics.

In addition to classification, MALLET includes tools for sequence tagging for applications such as named-entity extraction from text. Algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields. These methods are implemented in an extensible system for finite state transducers.

In addition to sophisticated Machine Learning applications, MALLET includes routines for transforming text documents into numerical representations that can then be processed efficiently. This process is implemented through a flexible system of "pipes", which handle distinct tasks such as tokenizing strings, removing stopwords, and converting sequences into count vectors.

321 questions
0
votes
1 answer

Mallet dirichelet parameter higher than 1

I've been using MALLET in order to perform my topic modeling(LDA). I tried to discover 20 topics in a dataset The outcome is the following (the list of keywords is not important for this question): 0 0.05013 list_of_topic_keywords_0 1 0.06444…
0
votes
1 answer

Using Mallet on Cygwin

I've been using Cygwin on Windows for a POSIX environment. When using the MALLET toolkit, however, I run into problems finding the classes. For example: $bin/mallet import-file Error: Could not find or load main class…
Peter O
  • 599
  • 1
  • 4
  • 18
0
votes
2 answers

Cannot run Mallet TopicModel

I am trying to run Mallet`s topic modelling but got the following error: Couldn't open cc.mallet.util.MalletLogger resources/logging.properties file. Perhaps the 'resources' directories weren't copied into the 'class'…
Ashkan
  • 159
  • 2
  • 10
0
votes
1 answer

Mallet java: get probability distribution of a documents collection

I would like to get a single probability distribution for a collection of documents, as I need to be able to use the KL-Divergence, is this possible? In this example: http://mallet.cs.umass.edu/topics-devel.php with the method…
Enzo
  • 597
  • 1
  • 8
  • 22
0
votes
2 answers

Topic Modelling and finding similarity in topics

Problem statement: I have several documents(20k documents). I need to apply Topic modelling to find similar documents and then analyze those similar documents to find how those are different from each other. Q: Could anyone suggest me any Topic…
user3421622
  • 89
  • 1
  • 10
0
votes
1 answer

Mallet Api - Get consistent results

I am new to LDA and mallet. I have the following query I tried running Mallet-LDA with the command line and by setting the --random-seed to a fixed value, I was able to get consistent results for multiple runs of the algorithm However, I did try…
Uno
  • 533
  • 10
  • 24
0
votes
1 answer

Mallet SimpleTagger FileNotFoundException: c:\mallet-2.0.7 (Access is denied)

I tried running Mallet from windows cmd following exactly the examples in the documentation and also from the solution in this post I keep getting this error, what could be the problem? c:\>java -cp…
DevEx
  • 4,337
  • 13
  • 46
  • 68
0
votes
0 answers

Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:exec (default-cli) on project mallet

When first running the example of mallet project, I got following error. My netbeans has Maven and also read the error instruction below. But I couldn't repair this problem. What should I do? Failed to execute goal…
0
votes
1 answer

error while importing txt file into mallet

I have been having trouble converting some txt files to mallet. I keep getting: Exception in thread "main" java.lang.IllegalStateException: Line #39843 does not match regex: and the Line#39843 reads: 24393584 |Title Validation of a Danish…
0
votes
1 answer

SimpleTagger based on CRF with mallet

Please, I want to run the class Simple Tagger in mallet. I work with eclipse. I only need to know the order of args to give in input. This link explained each argument but not the order (args[0], args1, etc.) In addition, do you have an idea about…
Marwa Louati
  • 33
  • 1
  • 5
0
votes
1 answer

Mallet SimpleTagger Classpath

I am going to use Mallet SimpleTagger for sequence tagging. However, I have problem with setting the classpath. As I have seen here: classpath I must be able to use java -cp to set the classpath. I followed the instructions here (I am sure that I…
user1419243
  • 1,655
  • 3
  • 19
  • 33
0
votes
1 answer

how to import file to mallet for topic modelling

I wanna use mallet for topic modelling and I have a question.My data is in a file one instance per line.But I didnt consider any label or instance name.So each line starts with the text.Is it required to have those labels or instance names?
0
votes
3 answers

bin/mallet train topics getting different results at every instance

When I am running the command bin\mallet train-topics --input input.tutorial.mallet --num-topics 40 --num-iterations 100 --optimize-interval 50 --optimize-burn-in 200 --output-state input.gz --output-topic-keys inputkeys.txt --output-doc-topics…
NAVEED
  • 95
  • 1
  • 1
  • 9
0
votes
2 answers

topic modeling on mallet

I'm currently doing the topic modeling things (beginner) I was thinking using mallet for some tool to get me understand this area, but, my problem is, I'd like to train a model based on, let's say, 1000 documents, to construct a model and using the…
JudyJiang
  • 2,207
  • 6
  • 27
  • 47
0
votes
1 answer

Building article Classifier - NLTK/ Scikit-learn/ Other NLP implementations

For my current project I have to build a topic modeling or classification utility which will process thousands of articles to classify them into various topics (topics may be 40-50 to start off with). For e.g. it'll go over database technologies…
whosthr
  • 21
  • 3
1 2 3
21
22