0

I try to develop an exe tu compare to pdf files. i need to get all differences between a master pdf and a generated one, like images or texts position and size, text content, and as much data as possibles.

here is my test with iTextSharp to try to get data that i need :

  Document doc = new Document();
    PdfReader reader = new PdfReader(@"C:\tmp\PDFFlattener\Input\Interior_30087_068x1XWX6516_M_210x297_FLA_68_1_5873301420.pdf");

    for (int page = 1; page <= reader.NumberOfPages; page++)
    {
        var currentPage = reader.GetPageN(page);

        PdfDictionary pageDico = currentPage.GetAsDict(PdfName.RESOURCES);
        PdfDictionary objectDico = pageDico.GetAsDict(PdfName.XOBJECT);
        foreach (var item in objectDico)
        {

            PdfName imgRef = item.Key;
            PRStream stream = (PRStream)objectDico.GetAsStream(imgRef);
            PdfName subType = stream.GetAsName(PdfName.SUBTYPE);
            PdfName coords = stream.GetAsName(PdfName.COORDS);
            PdfName width = stream.GetAsName(PdfName.WIDTH);
            PdfName xyz = stream.GetAsName(PdfName.XYZ);
        }
    }

all PdfNames except SUBTYPE return a null value.

Is it possible to do get X and Y position of an xObject ? I've tryed with ABCPdf also, but i have the same result.

thanks for reading,

jibhey
  • 31
  • 7
  • That's a very broad question. You are practically asking the audience on Stack Overflow to develop a product for you. Surely you understand that's not what Stack Overflow is for. The code sample you added looks more like an excuse ("Look, I did some effort!") than that it looks like code that serves the intended purpose. – Bruno Lowagie Dec 07 '17 at 15:39
  • This is not the case. thank you for your condescension. I don't think it's a broad question to ask how to get the position of an element in a pdf file. my question is only that.. – jibhey Dec 07 '17 at 16:18
  • I have edited my post, i hope it's more clear. I nevers aksed anyone to tell me how to develop a PDF comparator. – jibhey Dec 07 '17 at 16:21
  • You are asking for the X and Y coordinate of a Form XObject, *forgetting* to ask for scaling and rotation values. Form XObjects are positioned on a page by defining a transformation matrix. Since we're working in a 2-dimensional space, that transformation matrix consists of 6 values, not the 2 you are asking for. The transformation matrix you are looking for is stored inside the content stream. Look for the `Do` operator (responsible for adding XObjects) and the `cm` operator (responsible for changing the transformation matrix). – Bruno Lowagie Dec 07 '17 at 17:08
  • It's up to you whether to interpret this technically correct comment as condescending (because it proves that your efforts so far were rather poor) or helpful (although it means that you have plenty of work to do: you'll need to write a parser that interprets the content of your `stream` object). Given the amount of work involved in writing such a parser, I hope that you'll eventually agree that your question is too broad. Note that if it was my intention to be condescending, I would have cast a down-vote, which I didn't (yet; but your remark almost made me want to). – Bruno Lowagie Dec 07 '17 at 17:11
  • It's the first time I work with PDF on this way. I have not understand that object are positionned with matrix. I have made a lot of tests since Monday with ABCPdf and iText and the sample i've paste is only one of them. Documentation for ABCPdf and iText doesn't explain the matrix system (or i didn't find it) and with your explainations, i understand that my question is too broad. thanks for the clarification and sorry for the mistake – jibhey Dec 07 '17 at 17:56
  • The information is in [ISO-32000](https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf), in the old [iText in Action](https://www.manning.com/books/itext-in-action-second-edition) book, the recent [iText 7 jump-start tutorial](https://developers.itextpdf.com/content/itext-7-jump-start-tutorial/chapter-2-adding-low-level-content), and questions such as [How to get the co-ordinates of an image?](https://developers.itextpdf.com/content/best-itext-questions-stackoverview/content-parsing-extraction-and-redaction-text/itext7-how-get-co-ordinates-image) – Bruno Lowagie Dec 07 '17 at 18:25
  • @jibhey iText does explain the matrix system, you just didn't find it. There are some possibilities: 1. You didn't look for it 2. You didn't look good enough 3. You didn't know what to look for 4. You knew where to look and what to look for, but still couldn't find it. 1-3 is something nobody can help you with, but for item 4, with information that is known to exist (see Bruno's comment), then your feedback is valuable. You MUST tell me why you didn't find it. That feedback can be used to improve the iText website. – Amedee Van Gasse Dec 07 '17 at 23:36
  • I worked with ABCPdf9 and tried just this little test with iText. i didn't read the itext documentation from the beginning but juste throw google searches. If you want some improvment for the iText documentation, make it more redable for visually impaired persons like me :) I think your document is good also, but I have a lot of works and let it myself go by this works (escape proper documentation studies). Bruno's comments will help me to find my way to the solution i think. – jibhey Dec 08 '17 at 10:16

0 Answers0