0

I extracted a PDF that has tabular column data using apache Tika, in the result the row data from different columns are getting merged

Before Extracting | Column A | Column B | | -------- | -------- | | 1 | saikiran | | 2 | pavan |

The above Tabular column data is what i am trying to extract

After Extracting the below is the result

saikiran1 pavan2

I am expecting the result to be like i have mentioned down below 1 saikiran 2 pavan

  • Please provide enough code so others can better understand or reproduce the problem. – Community Mar 19 '23 at 17:15
  • import java.io.File; import java.io.IOException; import org.apache.tika.Tika; import org.apache.tika.exception.TikaException; import org.xml.sax.SAXException; public class TikaExtraction { public static void main(final String[] args) throws IOException, TikaException { //Assume sample.txt is in your current directory File file = new File("sample.txt"); Tika tika = new Tika(); String filecontent = tika.parseToString(file); System.out.println("Extracted Content: " + filecontent); } } – Sai Kiran Yalagandhala Mar 24 '23 at 12:51

0 Answers0