0

I am passing a PDF document to the Apache Tika software, which is in this format:

PDF document with paragraphs like these:

(iii) 50% of Text Text Text Text Text Text Text Text Text 
Text Text Text Text. 

Text Text Text Text Text Text Text 1 Text Text Text Text Text 
Text Text. 

I am getting the text in the same format as the input text provided in the PDF file.

But the output expected is :

(iii) 50% of Text Text Text Text Text Text Text Text Text Text Text Text Text. 

Text Text Text Text Text Text Text 1 Text Text Text Text Text Text Text.

I want to export the paragraphs in one line rather than in the same format as provided in the input file.

I am making the call to Tika in this way:

private Tika tika = new Tika();
String content = tika.parseToString(file);

I receive the content of the file in the content variable.

Is there any configuration through which I can make this happen?

mshikher
  • 174
  • 3
  • 20

0 Answers0