0

I need to know the way to configure Apache Tika.

Right now we are using it to parse our html files and then do a search based on the parsed data obtained from Apache Tika parser.

Issue : Apache tika actually merging the data available from different div's and includes not space between them.

For Example : If we have div's like below :

<div1>Girish</div><div>Kumar</div>

The parsed content would look like

GirishKumar

but I want it as

Girish (space) Kumar

How can I configure Apache tika such that after every div it includes a space??

Right now we have installed Apache Tika Jar in one of our server and make a call to it to get the response back.

Girish kumar
  • 735
  • 6
  • 8

0 Answers0