I am using PDFBOX and itextsharp dll and processing a pdf. so that I get the text coordinates of the text within a rectangle. the rectangle coordinates are extracted using the itextsharp.dll. Basically I get the rectangle coordinates from itextsharp.dll, where itextsharp uses the coordinates system as lower left. And I get the pdf page text from PDFBOX, where PDFBOX uses the coordinates system as top upper left. I need help in converting the Coordinates from lower left to upper left
Updating my question
Pardon me if you didn't understood my question and if not full information was provided.
well, Let me try to give more details from start.
I am working on a tool where I get a PDF in which a rectangle is drawn using some Drawing markups within a comment section. Now I am reading the rectangle coordinates using iTextsharp
PdfDictionary pageDict = pdReader.GetPageN(page_no);
PdfArray annotArray = pageDict.GetAsArray(PdfName.ANNOTS);
where pdReader is PdfReader.
And the page text along with its coordinates is extracted using PDFBOX. where as I have a class created pdfBoxTextExtraction in this I process the text and coordinate such that it returns the text and llx,lly,urx,ury "line by line" please note line by line not sentence wise.
So I want to extract the text that lays within the Rectangle coordinates. I got stuck when the coordinates of the rectangle returned from itextsharp i.e llx,lly,urx,ury of a rectangle has an origin at lower left where as the text coordinates returned from PDFBOX has an origin at upper left .then I realised I need to adjust the y-axis so that the origin moves from lower left to upper left. for the I got the height of the page and height of the cropbox
iTextSharp.text.Rectangle mediabox = reader.GetPageSize(page_no);
iTextSharp.text.Rectangle cropbox = reader.GetCropBox(page_no);
Did some basic adjustment
lly=mediabox.Top - lly
ury=mediabox.Top - ury
in some case the adjustment worked, whereas in some PDFs needed to do adjustment on cropbox
lly=cropbox .Top - lly
ury=cropbox .Top - ury
where as on some PDFs didn't worked.
All I need is help in adjusting the rectangle coordinates so that I get the text within the rectangle.
I guess i have found the adjustments for y-axis and the code is running properly. Currently i am testing on various PDFs
Will post the adjustments once testing is done – RAHIL KAZI Jan 02 '15 at 11:01