2

This question is a specific question about the PoDoFo library.

How do I access the /Differences array entry in the Encoding dictionary of a Font resource?

After I read the font name from the Tf operator, I can get the font through PoDoFo::PdfPage::GetFromResources. However, while the PdfFont class has PoDoFo::PdfFont::GetEncoding, I cannot see how you would get to the /Differences array from there.

From the PDFSpec (I'm only worried about Type 1 Fonts):

Encoding

(Optional) A specification of the font’s character encoding if different from its built-in encoding. The value of Encoding shall be either the name of a predefined encoding (MacRomanEncoding, MacExpertEncoding, or WinAnsiEncoding, as described in Annex D) or an encoding dictionary that shall specify differences from the font’s built-in encoding or from a specified predefined encoding (see 9.6.6, "Character Encoding").

Does this mean the PdfEncoding object returned from PoDoFo::PdfFont::GetEncoding contains the differences array (if there is one)?

(I've asked on the PoDoFo mailing list a little while ago, but posting here to see if someone with knowledge of PoDoFo and pdfs can help).

Ferruccio
  • 98,941
  • 38
  • 226
  • 299
Jesse Good
  • 50,901
  • 14
  • 124
  • 166

2 Answers2

2

PoDoFo knows many different kinds of encoding classes, cf. the encoding object factory:

if (pObject->IsName ())
{
    const PdfName & rName = pObject->GetName ();
    if (rName == PdfName ("WinAnsiEncoding"))
        return PdfEncodingFactory::GlobalWinAnsiEncodingInstance ();
    else if (rName == PdfName ("MacRomanEncoding"))
        return PdfEncodingFactory::GlobalMacRomanEncodingInstance ();
    else if (rName == PdfName ("StandardEncoding"))      // OC 13.08.2010
        return PdfEncodingFactory::GlobalStandardEncodingInstance ();
    else if (rName == PdfName ("MacExpertEncoding"))     // OC 13.08.2010 TODO solved
        return PdfEncodingFactory::GlobalMacExpertEncodingInstance ();
    else if (rName == PdfName ("SymbolEncoding"))        // OC 13.08.2010
        return PdfEncodingFactory::GlobalSymbolEncodingInstance ();
    else if (rName == PdfName ("ZapfDingbatsEncoding"))  // OC 13.08.2010
        return PdfEncodingFactory::GlobalZapfDingbatsEncodingInstance ();
    else if (rName == PdfName ("Identity-H"))
        return new PdfIdentityEncoding ();
}
else if (pObject->HasStream ())     // Code for /ToUnicode object 
{
    return new PdfCMapEncoding(pObject);
}
else if (pObject->IsDictionary ())
{
    return new PdfDifferenceEncoding (pObject);
}

(PoDoFo/src/doc/PdfEncodingObjectFactory.cpp)

You are interested in the last case. Thus, if the encoding object you have at hands is an instance of PdfDifferenceEncoding, you can use:

/** PdfDifferenceEncoding is an encoding, which is based
 *  on either the fonts encoding or a predefined encoding
 *  and defines differences to this base encoding.
 */
class PODOFO_DOC_API PdfDifferenceEncoding : public PdfEncoding, private PdfElement {
 public:
[...]
    /** 
     * Get read-only access to the object containing the actual
     * differences.
     *
     * \returns the container with the actual differences
     */
    inline const PdfEncodingDifference & GetDifferences() const;
[...]
};

(PoDoFo/src/doc/PdfDifferenceEncoding.h)

PdfDifferenceEncoding is declared in the same header class and offers some interesting methods:

/** A helper class for PdfDifferenceEncoding that
 *  can be used to create a differences array.
 */
class PODOFO_DOC_API PdfEncodingDifference {
    struct TDifference {
        int         nCode;
        PdfName     name;
        pdf_utf16be unicodeValue;
    };

    typedef std::vector<TDifference>                 TVecDifferences;
    typedef std::vector<TDifference>::iterator       TIVecDifferences;
    typedef std::vector<TDifference>::const_iterator TCIVecDifferences;

 public: 
    /** Create a PdfEncodingDifference object.
     */
    PdfEncodingDifference();

    /** Copy a PdfEncodingDifference object.
     */
    PdfEncodingDifference( const PdfEncodingDifference & rhs );

    /** Copy a PdfEncodingDifference object.
     */
    const PdfEncodingDifference & operator=( const PdfEncodingDifference & rhs );

    /** Add a difference to the object.
     * 
     *  \param nCode unicode code point of the difference (0 to 255 are legal values)
     *
     *  \see AddDifference if you know the name of the code point
     *       use the overload below which is faster
     */
    void AddDifference( int nCode );

    /** Add a difference to the object.
     * 
     *  \param nCode unicode code point of the difference (0 to 255 are legal values)
     *  \param rName name of the different code point or .notdef if none
     */
    void AddDifference( int nCode, const PdfName & rName );

    /** Tests if the specified code is part of the 
     *  differences.
     *
     *  \param nCode test if the given code is part of the differences
     *  \param rName write the associated name into this object if the 
     *               code is part of the difference
     *  \param rValue write the associated unicode value of the name to this value 
     *
     *  \returns true if the code is part of the difference
     */
    bool Contains( int nCode, PdfName & rName, pdf_utf16be & rValue ) const;

    /** Convert the PdfEncodingDifference to an array
     *
     *  \param rArray write to this array
     */
    void ToArray( PdfArray & rArray );

    /** Get the number of differences in this object.
     *  If the user added .notdef as a difference it is 
     *  counted, even it is no real difference in the final encoding.
     *  
     *  \returns the number of differences in this object
     */
    inline size_t GetCount() const;

 private:
    struct DifferenceComparatorPredicate {
        public:
          inline bool operator()( const TDifference & rDif1, 
                                  const TDifference & rDif2 ) const { 
              return rDif1.nCode < rDif2.nCode;
          }
    };

    TVecDifferences m_vecDifferences;
};

(PoDoFo/src/doc/PdfDifferenceEncoding.h)

mkl
  • 90,588
  • 15
  • 125
  • 265
  • Thanks for the help. My biggest problem was not knowing how to traverse through the objects in the pdf, I found out you can get to the underlying font object with `GetObject` and them from there get to its encoding entry by calling `GetIndirectKey("Encoding");`. Although I feel like the PoDoFo could be restructured better. – Jesse Good Jun 15 '13 at 21:52
1

Once you have a pointer to a PoDoFo::PdfFont, you can get to the underlying object with a call to GetObject since it inherits it from PoDoFo::PdfElement. From there call GetIndirectKey("Encoding") to get a pointer to the Encoding dictionary which contains the differences array and pass that to the PoDoFo::PdfDifferenceEncoding constructor.

PoDoFo::PdfObject* fntobj = fnt->GetObject();
if (fntobj)
{
    PoDoFo::PdfObject* fntdic = fntobj->GetIndirectKey("Encoding");
    if (fntdic)
    {
        PoDoFo::PdfDifferenceEncoding diff(fntdic);
        PoDoFo::PdfArray diffarray;
        PoDoFo::PdfEncodingDifference d(diff.GetDifferences());
        d.ToArray(diffarray);
    }
}
Jesse Good
  • 50,901
  • 14
  • 124
  • 166