UTF-8 characters are showing as boxes when converting HTML to PDF

Question

I wanted to convert HTML to PDF having special characters but the output is not showing the special characters.

from io import BytesIO
from django.http import HttpResponse
from django.template.loader import get_template
from xhtml2pdf import pisa

def html2pdf(template_source,context_dict={}):
    template=get_template(template_source)
    html=template.render(context_dict)
    result=BytesIO()
    pdf=pisa.CreatePDF(BytesIO(html.encode('utf-8')),result)
if not pdf.err:
    return HttpResponse(result.getvalue(),content_type="application/pdf")
return None

is my pdf.py and I have a HTML file which is pdf.html

<!DOCTYPE html>
<html lang="en">
<meta charset="UTF-8">
<head>
    <style>
        body {font-family: 'Josefin Slab';
        font-size: large;
        background-color: beige;}
        </style>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>
</head>
<body>
    <h2 class="utf">This is myŐ, Ű, ő or ű✅✅ pdf file with special char</h2>
</body>
</html>

When I am converting this into a PDF it is showing

This is my■, ■, ■ or ■■■■■■■■■■■■■ pdf file with special......

What to do now?

Looks like some part of the process is assuming something is UTF-8 when it is not. It could be, for example, UTF-16 for some reason. — tadman, Jan 12 '23 at 05:04
Are you sure that font contains those glyphs? Is that font available to your script when it runs? — Tim Roberts, Jan 12 '23 at 05:10
@TimRoberts I have a requirement like this to convert the html having these characters.Thats why I am doing this , Ascii characters are good but Ő, Ű, ő type of characters are not showing — divyanshu mishra, Jan 12 '23 at 05:56
What Tim is trying to say is that the problem might be fixable by understanding which font the renderer uses and overriding it to choose one which actually contains the characters you want. — tripleee, Jan 12 '23 at 08:06
What tadman is trying to say is that you might have a character encoding error. Maybe see the [Stack Overflow `character-encoding` tag info page](/tags/character-encoding/info) for a brief introduction. I don't particularly believe in this hypothesis; but if it's true, there is probably a bug in the toolchain you are using. — tripleee, Jan 12 '23 at 08:07
The indentation seems to be wrong, the `if` and the `return` should be part of the `def` block. — tripleee, Jan 12 '23 at 12:42

K J · Answer 1 · 2023-01-12T15:13:41.237

As noted in comments your using characters that do not exist in the font so use a different font ! However also see notes below

Here we can see that a PDF of the characters when correctly embedded will still work in the browser pdf view but are not handled well in a conventional pdf viewer.

Not all characters are available even in a full universal font, specifically coloured html objects like emoji or your ✅ since those are generated by browser fonts thus need conversion to image with underlying text. That combination of two for one is problematic for use in a PDF. It depends on the PDF writer if it will be possible with a given font so safer fudge is use the square root symbol.

Side Note in some Scandinavian countries a tick can mean wrong not right https://en.wikipedia.org/wiki/Check_mark

UTF-8 characters are showing as boxes when converting HTML to PDF

1 Answers1