0

I used the following code to get data in PDF from a particular location. I want to get bold text present in that location.

Rectangle rect = new Rectangle(0,0,250,250);
RenderFilter filter = new RegiontextRenderFilter(rect);
fontBasedTextExtractionStrategy strategy = new fontBasedTextExtractionStrategy();
strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filter); //Throws Error.

To start with, creating a new method called fontBasedTextExtractionStrategy instead of text simple TextExtractionStrategy help? Something like below

 public class fontBasedTextExtractionStrategy implements TextExtractionStrategy {
    private String text;

    @Override
    public void beginTextBlock() {
    }

    @Override
    public void renderText(TextRenderInfo renderInfo) {
        text = renderInfo.getText();

        System.out.println(renderInfo.getFont().getFontType());

        System.out.print(text);
    }

    @Override
    public void endTextBlock() {
    }

    @Override
    public void renderImage(ImageRenderInfo renderInfo) {
    }

    @Override
    public String getResultantText() {
        return text;
    }
}

But again how to call it properly?

raka
  • 355
  • 1
  • 6
  • 14

1 Answers1

1

Please take a look at the ParseCustom example. In this example, we create a custom RenderFilter (not a TextExtractionStrategy):

class FontRenderFilter extends RenderFilter {
    public boolean allowText(TextRenderInfo renderInfo) {
        String font = renderInfo.getFont().getPostscriptFontName();
        return font.endsWith("Bold") || font.endsWith("Oblique");
    }
}

This text will filter all text so that only text of which the Postscript font name ends with Bold or Oblique.

This is how you use this filter:

public void parse(String filename) throws IOException {
    PdfReader reader = new PdfReader(filename);
    Rectangle rect = new Rectangle(36, 750, 559, 806);
    RenderFilter regionFilter = new RegionTextRenderFilter(rect);
    FontRenderFilter fontFilter = new FontRenderFilter();
    TextExtractionStrategy strategy = new FilteredTextRenderListener(
            new LocationTextExtractionStrategy(), regionFilter, fontFilter);
    System.out.println(PdfTextExtractor.getTextFromPage(reader, 1, strategy));
    reader.close();
}

As you can see, we create a FilteredTextRenderListener that takes two filters, a RegionTextRenderFilter and our self-made filter based on the font.

Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165
  • The constructor FilteredTextRenderListener(LocationTextExtractionStrategy, RenderFilter, ParseCustom.FontRenderFilter) is undefined – Naval Kishor Jha Aug 01 '16 at 14:10
  • @NavalKishorJha As you can see, the answer to this question was accepted and upvoted. It is safe to assume that it is correct. If it doesn't work for you, you are probably using either a version of iText that is too old (in that case you need to upgrade) or too recent (in that case minor changes are needed). As you fail to tell us which version you are using, nobody can help you. – Bruno Lowagie Aug 01 '16 at 15:28
  • i have itext5.5.9 version,can you tell us the itext version of api above you used. – Naval Kishor Jha Aug 01 '16 at 16:08
  • https://github.com/itext/itextpdf/releases/tag/5.5.9 i have checked from this source. https://github.com/itext/itextpdf/blob/develop/itext/src/main/java/com/itextpdf/text/pdf/parser/FilteredTextRenderListener.java – Naval Kishor Jha Aug 01 '16 at 16:16
  • The constructor you require is clearly there: `FilteredTextRenderListener(TextExtractionStrategy delegate, RenderFilter... filters)`. `LocationTextExtractionStrategy` is derived from `TextExtractionStrategy`, and you are using two `RenderFilter` objects as extra parameters. Hence your allegation that the constructor `FilteredTextRenderListener(LocationTextExtractionStrategy, RenderFilter, ParseCustom.FontRenderFilter)` is undefined doesn't make sense. – Bruno Lowagie Aug 01 '16 at 16:38
  • Also: you are hijacking a question that is more than two years old, abusing the comments of an accepted answer to ask a new question. You should post a new question. – Bruno Lowagie Aug 01 '16 at 16:39