I am new to hadoop and behemoth and I followed the tutorial on https://github.com/DigitalPebble/behemoth/wiki/tutorial to generate a behemoth corpus for a text document, using the following command:
sudo bin/hadoop jar /home/madhumita/behemoth/core/target/behemoth-core-*-job.jar com.digitalpebble.behemoth.util.CorpusGenerator -i /home/madhumita/Documents/testFile -o /home/madhumita/behemoth/testGateOpCorpus
I am getting the error:
ERROR util.CorpusGenerator: Input does not exist : /home/madhumita/Documents/testFile
every time I run the command, though I have checked with gedit that the path is correct. I searched online for any similar issues, but I could not find any. Any ideas as to why it may be happening? If .txt file format is not acceptable, what is the required file format?