I wanna use mallet for topic modelling and I have a question.My data is in a file one instance per line.But I didnt consider any label or instance name.So each line starts with the text.Is it required to have those labels or instance names?
Asked
Active
Viewed 762 times
1 Answers
0
I am not sure about what exactly do you want.
For me, in Windows, I put all my data in a folder like "D:\Data\test1", in "test1" folder, there are a number of .txt files, each of them is one instance.
Then I use bin\mallet import-dir --input D:\Data\test1 --output test1.mallet --keep-sequence --remove-stopwords --extra-stopwords extra.txt
to generate the model.
I wish this could help. BTW, you can generate separate .txt files using Word or Excel Macro.

flyingmouse
- 1,014
- 3
- 13
- 29