1

I had a problem using podofo to modify pdf document, if you had time, please help me solve it!

I found podofo source on http://podofo.sourceforge.net/download.html , I compiled it on windows 7 x86, and I found podofo's function is very powerful.

But when I change something in the example "helloworld.cpp", just a little code change, for modifying pdf document and saving it by other file name!

When I passed a local pdf document file(the local pdf document is saved from a Word document which use windows COM's interface on office word 2007) into function, the new file is output successfully, but the output text is vertical flipped and the output texts's Y pos are vertically flipped.

(some guys said nn such a situation you have to deal with the fact that the existing content may have changed the graphics state, e.g. changed the current transformation matrix, maybe he is right, but I can't find the functions to changed the graphics state and changed the current transformation matrix)

This is the picture screenshot , I don't know why the output text is vertically flipped:

screenshot

The odd is it work well when I passed the document “output.pdf" created by example "helloworld" .

if you had time, please help me solve it, thank you very much!

My changed code looks like this:

#define MEMDOCUMENT 1 // macro switch  
void HelloWorld( const char* pszFilename ) 
{
    /*
     * PdfStreamedDocument is the class that can actually write a PDF file.
     * PdfStreamedDocument is much faster than PdfDocument, but it is only
     * suitable for creating/drawing PDF files and cannot modify existing
     * PDF documents.
     *
     * The document is written directly to pszFilename while being created.
     */
#if MEMDOCUMENT
     PdfMemDocument document( pszFilename ); //open local pdf documet
#else
     PdfStreamedDocument document( pszFilename ); //create a new pdf documet
#endif
    /*
     * PdfPainter is the class which is able to draw text and graphics
     * directly on a PdfPage object.
     */
    PdfPainter painter;

    /*
     * This pointer will hold the page object later. 
     * PdfSimpleWriter can write several PdfPage's to a PDF file.
     */
    PdfPage* pPage;

    /*
     * A PdfFont object is required to draw text on a PdfPage using a PdfPainter.
     * PoDoFo will find the font using fontconfig on your system and embedd truetype
     * fonts automatically in the PDF file.
     */     
    PdfFont* pFont;

    try {
        /*
         * The PdfDocument object can be used to create new PdfPage objects.
         * The PdfPage object is owned by the PdfDocument will also be deleted automatically
         * by the PdfDocument object.
         *
         * You have to pass only one argument, i.e. the page size of the page to create.
         * There are predefined enums for some common page sizes.
         */
#if MEMDOCUMENT
        pPage = document.GetPage(0); //get the first page and modify it
#else
        pPage = document.CreatePage( PdfPage::CreateStandardPageSize( ePdfPageSize_A4 ) );
#endif
        /*
         * If the page cannot be created because of an error (e.g. ePdfError_OutOfMemory )
         * a NULL pointer is returned.
         * We check for a NULL pointer here and throw an exception using the RAISE_ERROR macro.
         * The raise error macro initializes a PdfError object with a given error code and
         * the location in the file in which the error ocurred and throws it as an exception.
         */
        if( !pPage ) 
        {
            PODOFO_RAISE_ERROR( ePdfError_InvalidHandle );
        }

        /*
         * Set the page as drawing target for the PdfPainter.
         * Before the painter can draw, a page has to be set first.
         */
        painter.SetPage( pPage );

        /*
         * Create a PdfFont object using the font "Arial".
         * The font is found on the system using fontconfig and embedded into the
         * PDF file. If Arial is not available, a default font will be used.
         *
         * The created PdfFont will be deleted by the PdfDocument.
         */
        pFont = document.CreateFont( "Arial" );

        /*
         * If the PdfFont object cannot be allocated return an error.
         */
        if( !pFont )
        {
            PODOFO_RAISE_ERROR( ePdfError_InvalidHandle );
        }

        /*
         * Set the font size
         */
        pFont->SetFontSize( 18.0 );

        /*
         * Set the font as default font for drawing.
         * A font has to be set before you can draw text on
         * a PdfPainter.
         */
        painter.SetFont( pFont );

        /*
         * You could set a different color than black to draw
         * the text.
         *
         * SAFE_OP( painter.SetColor( 1.0, 0.0, 0.0 ) );
         */

        /*
         * Actually draw the line "Hello World!" on to the PdfPage at
         * the position 2cm,2cm from the top left corner. 
         * Please remember that PDF files have their origin at the 
         * bottom left corner. Therefore we substract the y coordinate 
         * from the page height.
         * 
         * The position specifies the start of the baseline of the text.
         *
         * All coordinates in PoDoFo are in PDF units.
         * You can also use PdfPainterMM which takes coordinates in 1/1000th mm.
         *
         */

        painter.SetTransformationMatrix(1,0,0,-1,0,pPage->GetPageSize().GetHeight());

        painter.DrawText( 56.69, pPage->GetPageSize().GetHeight() - 56.69, "Hello World!" );

        painter.DrawText( 56.69, pPage->GetPageSize().GetHeight() - 96.69, "Hello World!" );

        /*
         * Tell PoDoFo that the page has been drawn completely.
         * This required to optimize drawing operations inside in PoDoFo
         * and has to be done whenever you are done with drawing a page.
         */
        painter.FinishPage();

        /*
         * Set some additional information on the PDF file.
         */
        document.GetInfo()->SetCreator ( PdfString("examplahelloworld - A PoDoFo test application") );
        document.GetInfo()->SetAuthor  ( PdfString("Dominik Seichter") );
        document.GetInfo()->SetTitle   ( PdfString("Hello World") );
        document.GetInfo()->SetSubject ( PdfString("Testing the PoDoFo PDF Library") );
        document.GetInfo()->SetKeywords( PdfString("Test;PDF;Hello World;") );

        /*
         * The last step is to close the document.
         */

#if MEMDOCUMENT
        document.Write("outputex.pdf"); //save page change
#else
        document.Close(); 
#endif


    } catch ( const PdfError & e ) {
        /*
         * All PoDoFo methods may throw exceptions
         * make sure that painter.FinishPage() is called
         * or who will get an assert in its destructor
         */
        try {
            painter.FinishPage();
        } catch( ... ) {
            /*
             * Ignore errors this time
             */
        }

        throw e;
    }
}
c.t2008
  • 27
  • 5
  • You should provide us with more information about the actual problem and what you expected. Also what exactly did you change? – DomTomCat Jun 01 '16 at 10:45
  • As far as I can see the main difference to the original sample is that your code (with `#define MEMDOCUMENT 1`) edits an existing PDF by appending to the existing first page content instead of creating a new PDF and creating the content of a new page. In such a situation you have to deal with the fact that the existing content may have changed the graphics state, e.g. changed the current transformation matrix to flip things upside-down, and those changes impact your actions, e.g. your text is also flipped upside-down. – mkl Jun 01 '16 at 13:33
  • @mkl I guess something is different. but I don't know How to changed the graphics state and changed the current transformation matrix to flip things upside-down... – c.t2008 Jun 02 '16 at 06:04
  • Usually one does that by prepending a save-graphics-state operator and appending a restore-graphics-state operator to the existing content. I don't know how that is done in podofo, though. – mkl Jun 02 '16 at 06:36
  • *The odd is it work well when I passed the document “output.pdf" created by example "helloworld"* - That's not odd but completely logical: The helloworld example does not fool around with the transformation matrix, so there is no impact of a changed transformation matrix. Your other source PDF has fooled around with it, presumably to have the coordinate system origin in the upper left and increasing **y** coordinates downwards, and that impacts you in the manner you observe. – mkl Jun 02 '16 at 08:48
  • You should make a PoDoFo feature request for a `PdfPainter::SetPage` overload which encloses the existing content in save-graphics-state and restore-graphics-state. – mkl Jun 02 '16 at 09:11
  • @mkl, thank you very much! According to what you said, I found the function of changes transformation matrix, and the problem is as a result of Reflection effect. finally the "helloworld" example output pdfdocument correctly! – c.t2008 Jun 02 '16 at 11:17
  • @c.t2008 Great! You may want to post your solution including some code as an answer and accept it, so that people with the same issue may find it easily. – mkl Jun 02 '16 at 11:30

2 Answers2

2

For those struggling to see why this happens, it is due to this command at the top of every page (in this example the pages are A4 sized) that flips the content along the y axis:

1 0 0 -1 0 841 cm

This seems to be very common, existing in PDFs produced by more than one program according to my observations. There are also many PDFs that don't contain this at all. I suspect it is exclusively due to commit 1e07ce in cairo 1.15.4, see https://cairographics.org/releases/ChangeLog.cairo-1.15.4.

The tricky part is that this command is before any q (save transform), Q (restore transform) commands so it is not possible to go back to a known transform with a simple Q. In other words, the only way to go back to a known transform is to parse the page content stream and see what transform is there before the q/Q pairs. Then, once this transform is known, an inverse transform can be applied before any new content is overlaid onto the existing content.

To parse the page and get the transform before any q:

PoDoFo::PdfPage* page = ...;
PoDoFo::PdfContentsTokenizer tokenizer(page);
const char* token = NULL;
PoDoFo::PdfVariant param;
PoDoFo::EPdfContentsType type;
std::vector<PoDoFo::PdfVariant> params;
double tf_a = 1,    tf_c = 0,   tf_e = 0;
double tf_b = 0,    tf_d = 1,   tf_f = 0;
            //0          //0         //1

while(tokenizer.ReadNext(type, token, param)){

    //Command
    if(type == PoDoFo::ePdfContentsType_Keyword){

        //First Save at page, we assume that it will eventually be paired with enough Restores to go back to the current transform
        if(strcmp(token, "q") == 0)
            break;

        //Transform before first q, must apply the inverse when overlaying dots
        else if(strcmp(token, "cm") == 0){
            if(params.size() == 6){
                tf_a = params[0].GetReal();
                tf_b = params[1].GetReal();
                tf_c = params[2].GetReal();
                tf_d = params[3].GetReal();
                tf_e = params[4].GetReal();
                tf_f = params[5].GetReal();
                invertTransform(tf_a, tf_b, tf_c, tf_d, tf_e, tf_f);
            }
            else
                std::cout << "Warning! Found transform before first q at page with wrong number of arguments!" << std::endl;
        }
        else
            std::cout << "Warning! Unrelated command at page before first q: " << token << std::endl;

        params.clear();
    }

    //Parameter for command
    else if(type == PoDoFo::ePdfContentsType_Variant)
        params.push_back(param);
}

where invertTransform() is a small utility function:

void invertTransform(double& a, double& b, double& c, double& d, double& e, double& f){
    double m_11 = a,    m_12 = c,   m_13 = e;
    double m_21 = b,    m_22 = d,   m_23 = f;
         //m_31 = 0.0,  m_32 = 0.0, m_33 = 1.0;
    double det = m_11*(/*m_33**/m_22 /*- m_32*m_23*/) - m_21*(/*m_33**/m_12/* - m_32*m_13*/) /*+ m_31*(m_23*m_12 - m_22*m_13)*/;
    if(abs(det) < 1e-10){
        a = 1;  c = 0;  e = 0;
        b = 0;  d = 1;  f = 0;
          //0     //0     //1
    }
    else{
        double det_1 = 1.0/det;
        a = det_1*( /*m_33**/m_22 /*- m_32*m_23*/); c = det_1*(-/*m_33**/m_12 /*+ m_32*m_13*/); e = det_1*( m_23*m_12 - m_22*m_13);
        b = det_1*(-/*m_33**/m_21 /*+ m_31*m_23*/); d = det_1*( /*m_33**/m_11 /*- m_31*m_13*/); f = det_1*(-m_23*m_11 + m_21*m_13);
          //det_1*( m_32*m_21 - m_31*m_22)              det_1*(-m_32*m_11 + m_31*m_12)              det_1*( m_22*m_11 - m_21*m_12)
    }
}

Then, the inverse transform (simply identity if there was no cm before the first q) can be applied and things can be drawn on the page:

PoDoFo::PdfPainter painter;
painter.SetPage(page);
painter.Save();
painter.SetTransformationMatrix(tf_a, tf_b, tf_c, tf_d, tf_e, tf_f);

/* painter.Draw...() */

painter.Restore();
painter.FinishPage();

Of course, this entire solution assumes that there may be a single cm transform and no other transforms before the first q.

Another, much simpler solution would be to put one q before everything in the stream and put one Q after, followed by the desired content, but I'm not sure if it's straightforward to do with PoDoFo.

Ayberk Özgür
  • 4,986
  • 4
  • 38
  • 58
0

thank for mkl,with the help of mkl ,the question had been resolved.

the problem is because Reflection effect.podofo source code has the transformation matrix,you can change it before add texts or line on pdf document.

add some code like this: //

        painter.SetTransformationMatrix(1,0,0,-1,0,pPage->GetPageSize().GetHeight()); // set Reflection effect
        painter.Save();

        painter.DrawText( 56.69, pPage->GetPageSize().GetHeight() - 56.69, "Hello World!" );

        painter.DrawText( 56.69, pPage->GetPageSize().GetHeight() - 96.69, "Hello World!"
c.t2008
  • 27
  • 5