-2

How to extract text from a pdf file using Aspose PDF in Java? I'm looking for this functionality from Aspose API (no code samples?)

edit-
Req:

Let's say a pdf has this text at random locations along with some other data.

First Name: John
Last Name: Doe
City: New York
Phone: (999)-999-9999

Note: I can easily get these values if they are fields of the pdf file. These are in some random locations, not separate fields.

Where the values John, Doe, New York, (999)-999-9999 changes for each document.

I should be able to search for First Name, Last Name, City, Phone so it would return it's preceding value too.

Any suggestions?

intruder
  • 417
  • 1
  • 3
  • 18

1 Answers1

1

@intruder, You can use Regular expressions to retrieve the required text strings. Aspose.PDF for Java API accepts regular expressions, please try the code as follows:

Java

Document pdfDocument = new Document("source.pdf");
// like 1999-2000
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("\\d{4}-\\d{4}"); 
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber.setTextSearchOptions(textSearchOptions);
pdfDocument.getPages().accept(textFragmentAbsorber);
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();
for (TextFragment textFragment : (Iterable<TextFragment>) textFragmentCollection) 
    System.out.println("Text :- " + textFragment.getText());

I work with Aspose as Developer evangelist.

  • I achieved it by using `TextAbsorber`. Is there any other efficient way to do it? – intruder Apr 04 '18 at 13:40
  • Your suggestion is working only if we know the exact length of the value. But `Names` and `City` are not of fixed length. Like if I search for `First Name`, it should show `John`. Here, it shows `First Name` again. – intruder Apr 04 '18 at 13:53
  • @intruder, You can enhance the regular expression. In the above code, I have shared a regular expression as an example. Kindly send me your source PDF, code and expected output. I will investigate your scenario in my environment. – Imran Rafique Apr 05 '18 at 16:13