I am using tika 2.6.x with java opts as XX:MaxMetaspaceSize=200M -Xss512K -XX:MaxDirectMemorySize=64M for below code. It seems that processing time is very high(around a minute) for a pure text containing docx file of size more than equals to 2 MB. Same code logic is working very efficient for 2 MB csv, pptx and other files and sending the response in less than 5seconds. Any more configuration needed? please suggest, thanks.
Parser pasrer=new AutoDetectParser()
BufferWriter=Files.newBufferedWriter("MyFile")
Handler handler=new BodyContentHandler(BufferWriter)
//some code logic for embdedded image
context.set(classOf[EmbeddedDocumentExtractor], imageChecker)
parse(TikaInputStreamObj, handler, Metadata, context)