1

I am having trouble getting consistent result using itext parser. This is the code

public void parsePdf(String pdf) throws IOException {
    PdfReader reader = new PdfReader(pdf);
    Rectangle rect = new Rectangle(370,280, 380, 613);
    RenderFilter filter = new RegionTextRenderFilter(rect);
    TextExtractionStrategy strategy;
    strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filter);      
    s=PdfTextExtractor.getTextFromPage(reader, 1, strategy);
    reader.close();
    System.out.println(s);
}

I am creating pdfs with report manager. Templates for two types of files are different but the positioning of the fields that I want to extract is the same.
I am using LocationStrategy. The rectangle is pointing to the position that I want to parse. When printed on paper the field in question is in the same position, so my guess is that is should parse the same, but that is not the case. First doc gives me expected results, but when I parse the second with the same coordinates for my rectangle I am parsing something that is two lines above the expected place. Hope this is a better explanation.
I set the templates in report manager so that the target field is at the same position, with same font size, spacing, same document header for both pdfs as it is evident when printed out, but when parsed i get two lines offset.

Alexis Pigeon
  • 7,423
  • 11
  • 39
  • 44
caniaskyouaquestion
  • 657
  • 2
  • 11
  • 21
  • This is very vague. You only tell us that *there is a difference in a result.* You neither explain which difference nor share sample PDFs in question to reproduce. – mkl Oct 24 '14 at 13:47
  • edited my question, sorry for a vague first post @mkl – caniaskyouaquestion Oct 24 '14 at 13:55
  • 3
    You seem to think that because the printed copieshave the data at the same position, the coordinates of that position must conincide. That is not necessarily true, the PDFs may have the origin of their coordinate systems at different positions. Inspect the crop box values or provide the PDFs in question for analysis. – mkl Oct 24 '14 at 14:05
  • thanks @mkl I used a bigger rectangle and parsed the string and worked around it. You are right, my initial assumption may not be valid. – caniaskyouaquestion Oct 28 '14 at 14:18

0 Answers0