I need to know the way to configure Apache Tika.
Right now we are using it to parse our html files and then do a search based on the parsed data obtained from Apache Tika parser.
Issue : Apache tika actually merging the data available from different div's and includes not space between them.
For Example : If we have div's like below :
<div1>Girish</div><div>Kumar</div>
The parsed content would look like
GirishKumar
but I want it as
Girish (space) Kumar
How can I configure Apache tika such that after every div it includes a space??
Right now we have installed Apache Tika Jar in one of our server and make a call to it to get the response back.