I was wondering whether it would be possible to tokenize words in Mallet by n-gram size between 1 and 2?
This is the code that I have used so far:
bin\mallet import-dir --input sample-data\web\en --output sample.txt --keep-sequence-bigrams --remove-stopwords
bin\mallet train-topics --input sample.txt --num-topics 20 --optimize-interval 10 --output-doc-topics sample_composition.txt --output-topic-keys sample_keys.txt
Thank you in advance.