-1

I got this function of itextsharp library to extract pdf text line by line:

PdfTextExtractor.GetTextFromPage(reader, page);

...but I need to put ENTER at every line every end of line of pdf even if there is empty row it should read empty row.

Jesse
  • 8,605
  • 7
  • 47
  • 57
shailendra
  • 165
  • 2
  • 3
  • 8
  • 2
    `PdfTextExtractor.GetTextFromPage` **does** put end-of-line markers at the end of every line it recognizes (cf. the method `GetResultantText` of the `LocationTextExtractionStrategy`: `sb.Append('\n');`). That being said there generally *is no **end of line** or **row** in a PDF!* Therefore, if iText's heuristics for *interpreting such concepts into the PDF page content* don't work for you, you may need a custom `TextExtractionStrategy` implementation. If you need help with that, please give more details, especially what you get, what you want, and a sample PDF illustrating your issue. – mkl May 06 '13 at 08:59
  • +1 for @mkl: There is no such thing as 'a line' in a PDF, nor is there such a thing as 'ENTER'. Content is added at absolute positions; it isn't organized in lines. – Bruno Lowagie May 06 '13 at 10:18

2 Answers2

4

read into a string variable then split e.g. String page = PdfTextExtractor.getTextFromPage(reader, 2);

String[] s1 = page.split('\n'); 
adebayo
  • 41
  • 1
0

Please go through the following Links:

  1. http://api.itextpdf.com/index.html?com/itextpdf/text/pdf/parser/PdfTextExtractor.html

  2. link of stack-overflow: Itextsharp text extraction

Community
  • 1
  • 1
Vaibhav Jain
  • 3,729
  • 3
  • 25
  • 42
  • 1
    Welcome to Stack Overflow! Whilst this may theoretically answer the question, [it would be preferable](http://meta.stackexchange.com/q/8259) to include the essential parts of the answer here, and provide the link for reference. – JJJ May 06 '13 at 06:12
  • Concerning the stack overflow link: Please make clear that you indeed want to refer to the answers making use of the `PdfTextExtractor` class. – mkl May 06 '13 at 08:47