I am trying to build a text classifier using mallet. The data is somehow big so I am looking for a way, if possible, to run the "import" task on multiple threads because it is taking a long time to load. Few questions here:
Is there a way to manually parallelize the process by dividing the data and importing it separately then join them. I know I can run them in parallel and get multiple input files, but can I combine the resulting mallet input files before training the classifier?
Does mallet itself parallalize this process if there are available threads on the machine?
Thanks for help!