0

I have 20000 news documents to run topic modeling on it:

I want to see the topic dynamics and evolution from the documents. I tried to use the following batch script with Topic modeling by mallet but not work.

#!/bin/bash
for filename in /Users/JasonDou/code/internet_finance/bydocafterseg2; do
    ./bin/mallet import-dir --input /Users/JasonDou/code/internet_finance/bydocafterseg2/159047443.txt  --output bydoc-input.mallet --keep-sequence --remove-stopwords
done
Rahul
  • 402
  • 2
  • 9
  • 23
Jason
  • 47
  • 2
  • 11

1 Answers1

1

You are missing an asterisk:

#!/bin/bash
for filename in "/Users/JasonDou/code/internet_finance/bydocafterseg2/"*; do
    [ -e "$filename" ] || continue
    ./bin/mallet import-dir --input "$filename" \
      --output bydoc-input.mallet --keep-sequence --remove-stopwords
done

The above will list iterate over each file in bydocafterseg2. You can change it to all .txt files with: "bydocafterseg2/"*".txt"

Andreas Louv
  • 46,145
  • 13
  • 104
  • 123