2

I'm generating PDF in a web application which is using Apache FOP 2.4. When generating PDF with Emojis it is resulting in the PDF as ###############. But normal alphanumeric text is appearing properly. Following is how my renderers looks like in fop.xconf file.

<renderers>
    <renderer mime="application/pdf">
        <filterList>
            <value>flate</value>
        </filterList>
        <fonts>
            <font embed-url="ARIAL.TTF">
                <font-triplet name="Arial" style="normal" weight="normal"/>
            </font>
            <font embed-url="ARIALBD.TTF">
                <font-triplet name="Arial" style="normal" weight="bold"/>
            </font>
            <font embed-url="ARIALI.TTF">
                <font-triplet name="Arial" style="italic" weight="normal"/>
            </font>
            <font embed-url="ARIALBI.TTF">
                <font-triplet name="Arial" style="italic" weight="bold"/>
            </font>
            <font embed-url="ARIALUNI.TTF">
                <font-triplet name="Arial Unicode MS" style="normal" weight="normal"/>
            </font>
            <font embed-url="NotoColorEmoji-Regular.ttf">
                <font-triplet name="Noto Color Emoji" style="normal" weight="normal"/>
            </font>
            <font embed-url="MSGOTHIC.TTC" metrics-url="msgothic.xml">
                <font-triplet name="Gothic" style="normal" weight="normal"/>
                <font-triplet name="Gothic" style="normal" weight="bold"/>
                <font-triplet name="Gothic" style="italic" weight="normal"/>
                <font-triplet name="Gothic" style="italic" weight="bold"/>
            </font>
        </fonts>
    </renderer>
</renderers>

I added the NotoColorEmoji-Regular.ttf to support emojis. But I'm still not getting the emojis in the PDF properly. I have added all the font files at the same level as the fop.xconf file. How can I resolve this emoji issue?

[Update as at 11/04/2023]

Once I added the "Noto Color Emoji" into the font-family attribute in the XSL file, it stopped showing ####### symbols. Instead now it's giving empty white symbols.

It order to test this further I created a sample application to generate PDF files using Apache FOP. You can find the project here https://github.com/sachindragh/pdf-testing.

Following are the java code, xsl file, input.xml and the fop.xonf file contents in it.

public static void main(String[] args) throws SAXException, TransformerException, IOException {
    FopFactory fopFactory = FopFactory.newInstance(new File("./fop.xconf"));
    OutputStream out = new BufferedOutputStream(new FileOutputStream(new File("fop2_8output.pdf")));
    try {
        Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, out);
        TransformerFactory factory = TransformerFactory.newInstance();
        Transformer transformer = factory.newTransformer(new StreamSource(new File("stylesheet.xsl")));
        Source src = new StreamSource(new File("input.xml"));
        Result res = new SAXResult(fop.getDefaultHandler());
        transformer.transform(src, res);
        System.out.println("PDF file generated successfully.");
    } finally {
        out.close();
    }
}

stylesheet.xsl

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <xsl:output method="xml" indent="yes"/>
    <xsl:template match="/">
        <!-- Vladimir Script    VLADIMIR.TTF -->
        <!-- Viner Hand ITC     VINERITC.TTF -->
        <!-- Noto Color Emoji   NotoColorEmoji-Regular.ttf -->
        <fo:root font-family="Noto Color Emoji,Viner Hand ITC">
            <fo:layout-master-set>
                <fo:simple-page-master master-name="A4-portrait"
                                       page-height="29.7cm" page-width="21.0cm" margin="2cm">
                    <fo:region-body/>
                </fo:simple-page-master>
            </fo:layout-master-set>
            <fo:page-sequence master-reference="A4-portrait">
                <fo:flow flow-name="xsl-region-body">
                    <fo:block font-family="Noto Color Emoji,Viner Hand ITC">
                        <xsl:value-of select="value"/>
                    </fo:block>
                </fo:flow>
            </fo:page-sequence>
        </fo:root>
    </xsl:template>
</xsl:stylesheet>

input.xml

<?xml version="1.0" encoding="UTF-8"?>
<value>
    This is a sample text Emojis
    BMP Glyphs  &#x2663; &#x2705;
    Non-BMP Glyphs  &#x1F410; &#x1F600;
</value>

fop.xconf

<?xml version="1.0"?>
<fop version="1.0">
  <renderers>
    <renderer mime="application/pdf">
      <filterList>
        <value>flate</value>
      </filterList>
      <fonts>
        <font embed-url="NotoColorEmoji-Regular.ttf">
            <font-triplet name="Noto Color Emoji" style="normal" weight="normal"/>
        </font>
        <font embed-url="VINERITC.TTF">
          <font-triplet name="Viner Hand ITC" style="normal" weight="normal"/>
        </font>
        <!--<font embed-url="VLADIMIR.TTF">
          <font-triplet name="Vladimir Script" style="normal" weight="normal"/>
        </font>-->
      </fonts>
      <version>1.7</version>
    </renderer>
  </renderers>
</fop>

I ran the above example for Apache FOP library version 2.4 and 2.8 with PDF version set to 1.4 and 1.7 in each library. In the input.xml I added both BMP glyphs(U+2663, U+2705) and non-BMP glyphs(U+1F410, U+1F600). But I get the same result with empty white symbols for both BMP glyphs and non-BMP glyphs. Following is what pdf output looks like.

enter image description here

When I do a select all I can see some characters getting selected in those empty spaces.

enter image description here

And when I copy and paste the selected text into notepad some of the characters appear as follows.

enter image description here

I also looked into the FOP issue https://issues.apache.org/jira/browse/FOP-1969 mentioned by @Kevin Brown. It seems a fix for the issue is already merged in 2.3 release.

enter image description here

Can anyone help me to figure this issue or any options that I can try to solve this?

SachinD
  • 43
  • 1
  • 8
  • 1
    You also need to modify `fo-pdf.xsl`, like this ` Noto Color Emoji,Sans-serif,Arial 8pt ` , add your font into some define like this `footer.content.properties` – life888888 Apr 04 '23 at 03:12
  • Do you get any error / warning if you run the pdf creation from a command prompt? – lfurini Apr 04 '23 at 09:06
  • 1
    Have you checked FOP support documents? They would need to support supplemental planes .. aka: Unicode Character “” (U+1F410) = Unicode 1F410 – Kevin Brown Apr 04 '23 at 18:56
  • @life888888 Thanks for the information. I added the "Noto Color Emoji" into the xsl file I use and it stopped outputting #######. But still I'm not able to see the characters in the PDF. What I see is some empty white spaces in the pdf. But when I copy the text including the white spaces in the PDF and paste it into notepad I could see some characters in it. Any idea on what might be causing the issue here? – SachinD Apr 11 '23 at 15:05
  • @lfurini No I couldn't see any errors in the logs. – SachinD Apr 11 '23 at 15:05
  • @SachinD, use this font: https://github.com/asciidoctor/asciidoctor-pdf/blob/main/data/fonts/notoemoji-subset.ttf – life888888 Apr 12 '23 at 05:12

1 Answers1

2

I do not believe that would be supported in FOP. Most all the characters in that font are in the Supplementary Multilingual Plane. From FOP documentation:

"Support for Unicode characters outside of the Base Multilingual Plane (BMP), i.e., characters whose code points are greater than 65535, is not yet implemented. See issue FOP-1969."

For instance:

Unicode Character “” (U+1F410) = Unicode 1F410 = 128016

If you installed that font in a Windows machine and use charmap to view it, you will find only maybe 35 characters. Searching for one shows no character:

enter image description here

If you tried to view the entire font, you only see these:

enter image description here

Kevin Brown
  • 8,805
  • 2
  • 20
  • 38
  • hi @Kevin Brown Thanks a lot for your detailed explanation on the issue. Based on the information you provided I looked into the https://issues.apache.org/jira/browse/FOP-1969 issue and it seems like got fixed and merged into 2.3 release. I updated the new findings in my question itself. Any idea on why I might be getting this empty white symbols for both BMP and non BMP glyphs? – SachinD Apr 11 '23 at 16:15