I'm trying to work with Cascading to create and execute complex data processing workflows on a local Hadoop cluster.
I wish to create a TFIDF vector so I can apply Machine Learning algorithms such as NaiveBayes on it using the Apache Spark framework.
The problem is that after I create the jar and I launch it using the following commands the program freezes. Here is the log file.
You can find the sources here. The related source code is in part6.
Thanks!