After reading several posts on SO and Apache sites, I got the following down in my build path
tika-app-1.10.jar
poi-3.13.jar
poi-examples-3.13.jar
poi.excelant-3.13.jar
poi-ooxml-3.13.jar
poi-ooxml-schemas-3.13.jar
poi-scratchpad-3.13.jar
openxml4j-1.0-beta.jar
xmlbeans-2.6.jar
Despite having these, I cannot seem tot parse .doc and .doc files using, but PDf, JPEG work fine. I am trying to understand why it would not work properly for office documents when I have all the dependencies listed?
The relevant stack trace is also posted here