1

I use iTextSharp to extract text line by line from .pdf's and it works very well.

I am now looking at engineering drawings and have a need to identify lines of text that resides inside boundaries of circles in my file.

enter image description here

There can be many circles (not overlapping) in my files which typically have 2 or 3 lines of text inside the circle boundary.

Does anyone know if this is possible using iTextSharp?

Here is my existing code:

Try

Using reader As New PdfReader(filePath)

    For intPages As Integer = 1 To reader.NumberOfPages
        If intFirst = 1 Then
            sbTXT.Append(Trim(PdfTextExtractor.GetTextFromPage(reader, intPages, New LocationTextExtractionStrategy())))
            intFirst = 2
        Else
            sbTXT.Append(Environment.NewLine & Trim(PdfTextExtractor.GetTextFromPage(reader, intPages, New LocationTextExtractionStrategy())))
        End If
    Next

End Using

Catch ex As Exception
MsgBox("There was an error extracting text from the file", vbInformation, "Error Extracting Text")

End Try
GoodJuJu
  • 1,296
  • 2
  • 16
  • 37
  • I don't think that iTextSharp has the facility to analyse raster images. Is the text in the drawings raster or actual text? Does it (iTextSharp) perform any optical character recognition? (The text does look very similar to a standard plotter font). – Paul Feb 07 '17 at 15:40
  • The text is actual searchable text inside the pdf. iTextSharp has no problem extracting the text. I need iTextSharp to somehow identify whether the text resides inside a circle. If it does then I need to have a different processing method for the text that differs from text outside of the circles. – GoodJuJu Feb 07 '17 at 15:53
  • Are the circles drawn using a background bitmap or vector graphics? Please share representative examples PDFs. – mkl Feb 07 '17 at 16:03
  • The circles are vector graphics... – GoodJuJu Feb 07 '17 at 16:06
  • I did find [this post](http://stackoverflow.com/questions/7344437/vector-graphics-with-itextsharp), which would suggest that iTextSharp does not process images in the same way. You may need to process the document itself and extract the locations of the vector graphics objects then compare the positioning of the returned text with the locations of the graphics objects. If you're doing this, however, you may as well just extract the text yourself too! – Paul Feb 07 '17 at 16:11
  • Thanks Paul, I think you are right. I will need to return all the circle objects and then compare the coordinates with the coordinates of the text. Then I need to figure out a way of identifying whether the text fits inside the circle boundary. – GoodJuJu Feb 09 '17 at 10:03
  • @GoodJuJu Hi, it seems a little bit old thread, but we have been dealing with something like this, with a preety much similar kind of diagram. I'm just wondering if this was feasible for you. Did you find a way? Thanls! – Bruno Sendras Sep 11 '20 at 20:24

0 Answers0