I am passing a PDF document to the Apache Tika software, which is in this format:
PDF document with paragraphs like these:
(iii) 50% of Text Text Text Text Text Text Text Text Text
Text Text Text Text.
Text Text Text Text Text Text Text 1 Text Text Text Text Text
Text Text.
I am getting the text in the same format as the input text provided in the PDF file.
But the output expected is :
(iii) 50% of Text Text Text Text Text Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text 1 Text Text Text Text Text Text Text.
I want to export the paragraphs in one line rather than in the same format as provided in the input file.
I am making the call to Tika in this way:
private Tika tika = new Tika();
String content = tika.parseToString(file);
I receive the content of the file in the content variable.
Is there any configuration through which I can make this happen?
` tags?
– Gagravarr Jun 12 '20 at 04:37tags one by one?
– mshikher Jun 16 '20 at 07:14