1

I am using the wkhtmltopdf.exe to convert HTML to PDF, using the below source code. The problem is - the PDF shows "?" in place of all non-English characters like Chinese, Japanese, Russian, Arabic. When output as HTML, the characters are shown correctly. I tried setting different encoding to the HTML (utf-8, utf-16, gb2312), but the PDF doesn't render non-English languages.

I read in wkhtmltopdf forums about installing Chinese fonts on the server, but looks like they are not for Windows server environment. Moreover, the fonts seems to be available on the server, since HTML renders correctly?

Any ideas to make it work?

Code:

private void WritePDF(string html)
    {
        string inFileName,
                outFileName,
                tempPath;
        Process p;
        System.IO.StreamWriter stdin;
        ProcessStartInfo psi = new ProcessStartInfo();


        tempPath = Request.PhysicalApplicationPath 
            + ConfigurationManager.AppSettings[Constants.AppSettings.ExportToPdfTempFolder];
        inFileName = Session.SessionID + ".htm";
        outFileName = Session.SessionID + ".pdf";

        // run the conversion utility
        psi.UseShellExecute = false;
        psi.FileName = Server.MapPath(ConfigurationManager.AppSettings[Constants.AppSettings.ExportToPdfExecutablePath]);
        psi.CreateNoWindow = true;
        psi.RedirectStandardInput = true;
        psi.RedirectStandardOutput = true;
        psi.RedirectStandardError = true;
        //psi.StandardOutputEncoding = System.Text.Encoding.gb;

        // note that we tell wkhtmltopdf to be quiet and not run scripts
        // NOTE: I couldn't figure out a way to get both stdin and stdout redirected so we have to write to a file and then clean up afterwards
        psi.Arguments = "-q -n - " + tempPath + outFileName;

        p = Process.Start(psi);

        try
        {
            stdin = p.StandardInput;
            stdin.AutoFlush = true;

            stdin.Write(html);
            stdin.Close();

            if (p.WaitForExit(15000))
            {
                // NOTE: the application hangs when we use WriteFile (due to the Delete below?); this works
                Response.BinaryWrite(System.IO.File.ReadAllBytes(tempPath + outFileName));
            }
        }
        finally
        {
            p.Close();
            p.Dispose();
        }

        // delete the pdf
        System.IO.File.Delete(tempPath + outFileName);
    }
itsbalur
  • 992
  • 3
  • 17
  • 39
  • Did you manage to solve this issue? Any progress reports? I recently have converted my app from disk access to direct streams and it still works fine. So, is this still an issue? – Joel Peltonen Oct 21 '13 at 07:00

2 Answers2

5

Wkhtmltopdf definitely can render non-English characters like Chinese, Japanese, Russian, Arabic. In most cases they are not displayed because HTML template misses meta tag with appropriate charset definition. By default .NET uses UTF-8 encoding and in this case HTML template should contain the following meta tag:

<meta http-equiv="content-type" content="text/html; charset=utf-8" />

By the way, instead of calling wkhtmltopdf directly you may use one of the .NET wrappers like NReco PdfGenerator (I'm an author of this library).

Vitaliy Fedorchenko
  • 8,447
  • 3
  • 37
  • 34
  • 2
    this one is also not working for me . it simply showing black filled boxes instead characters. i want hindi to be printed on pdf – Mujthaba Ibrahim Jul 25 '17 at 13:19
  • The same issue, it's displaying black box. – Brijesh Mavani Jan 11 '19 at 11:01
  • @BrijeshMavani also ensure that language-specific fonts that you use are installed on the server where PDF is generated. This is typical situation in case of Windows Server. – Vitaliy Fedorchenko Jan 11 '19 at 11:15
  • I am having similar issue where my HTML contains Hindi characters and when I am converting it using WKHTMLtoPDF, output file contains black box against each non english character. I have setup character set in HTML page using meta tag, also while running conversion command wkhtmltoPDF giving --encoding parameter but no success. Any suggestion would be appreciated if you have figure out the solution... – Vaibhav Jain Sep 09 '22 at 09:22
  • @VaibhavJain wkhtmltopdf uses system-installed Windows fonts, so first of all please ensure that you have appropriate fonts pack installed. – Vitaliy Fedorchenko Sep 09 '22 at 14:06
  • I have used https://github.com/Sicos1977/ChromeHtmlToPdf, it resolved my issue. – Vaibhav Jain Nov 08 '22 at 12:10
0

Make sure your font supports the characters and your source is UTF-8 and it should work - I have tested wkhtmltopdf using korean, chinese, polish and various other characters as well and it has always worked. See my answer on the other similar question https://stackoverflow.com/a/11862584/694325

I write my html sources like but otherwise my PDF generation is VERY similar to yours. I'd check that everything everywhere is utf-8.

using (TextWriter tw = new StreamWriter(path, false, System.Text.Encoding.UTF8))
{
    tw.WriteLine(contents);
}

PDFs generated from source like this seem to work without problems.

Community
  • 1
  • 1
Joel Peltonen
  • 13,025
  • 6
  • 64
  • 100
  • I write the html to a temp file instead of feeding it to stdin. Haven't tried to feed it in directly actually. My way will cause some IO overhead, I know :/ – Joel Peltonen Aug 08 '12 at 10:49