IronPDF EAP doesn't interpret C# string as UTF-16

Question

I'm attempting to convert a bit of HTML to a PDF document with IronPDF EAP 2021.6.3135. After creating a new ChromePdfRenderer, I call RenderHtmlAsPdfAsync on it, passing the HTML string as the only argument. The HTML is a single <div> with several nested <div>s, one of which contains CJK text. IronPDF appears to interpret that text as either ASCII or UTF-8; in any case, it renders it as nonsense. This works properly—without the workaround mentioned below—with the current release of IronPDF (2021.3.1).

Inserting a byte-order mark (\uFEFF) at the beginning of the string fixes the problem, but I shouldn't need to do that. Is there a new setting/option in the EAP branch's API that I've overlooked? Or is this a known issue that will get addressed before release?

Please can you share the exact code used to render the text? You can also get support directly through http://ironpdf.com or emailing developers@ironsoftware - IronSoftware gladly provide support for anyone considering the library — darren, Aug 03 '21 at 06:16
@darren: I've sent an email to developers@ironsoftware.com with your name in the address; a minimal VS solution that demonstrates the problem is attached. I suspect that the problem only manifests when (enough) "ordinary" (non-CJK) text precedes the CJK text in the HTML that IronPDF processes. — Rich Armstrong, Aug 03 '21 at 17:39
I've placed a working test solution that demonstrates the problem on [GitHub](https://github.com/AgWillo/IronPDF-Testing). — Rich Armstrong, Aug 23 '21 at 17:09
After further investigation, we have found that our Chrome renderer fails after the html string length exceeds maximum of an unsigned short (65535) Thank you for bringing this to our attention and this will be fixed in the upcoming release of IronPdf. — darren, Aug 24 '21 at 04:43
1. The HTML example provided does not seem to render properly within regular Chrome browser as is seems that Chrome encoding autodetection fails with very long html strings. 2. We recommend include `` at the beginning of any HTML file which contains utf-16 characters. This is a reasonable request because ultimately it is difficult to determine the desired decoding. 3. However, that said, we are reviewing the possibility of automatically defaulting to utf-16 encoding if no other encoding is specified, to help alleviate these kinds of issues. — darren, Aug 24 '21 at 09:02
Thanks for your attention to this. I agree: adding the `` tag for the encoding is reasonable. If you'd care to post your response as an Answer, I'd be happy to mark it as the accepted answer. It does resolve the issue. — Rich Armstrong, Aug 24 '21 at 11:51

Stephanie · Answer 1 · 2021-08-03T06:14:28.707

Looks like a solid bug report. I spoke to Darren and JD from Iron Software by email and they reported will be fixed before release.

I suspect it was an issue using the old style HtmlToPdf class

I tried the ChromePdfRenderer class instead and had no issues rendering UTF-16 strings: (found here https://ironpdf.com/object-reference/eap/api/)

ChromePdfRenderer Renderer = new IronPdf.ChromePdfRenderer();
var doc = Renderer.RenderHtmlAsPdf("سلام دنیا");
doc.SaveAs("test.pdf");

EAP software literally means - "its not perfect - please report bugs so we can fix them before release"...so thanks Rich as a fellow EAP user.

They are on developers@ironsoftware.com and try to help even unpaid users

score 0 · Accepted Answer · answered Aug 25 '21 at 04:25

0

Chrome encoding autodetection fails with very long html strings.

It is recommended to include:

<meta charset="utf-16"/>

at the beginning of any HTML file which contains utf-16 characters. (This is a reasonable request because ultimately it is difficult to determine the desired decoding).

Iron Software is reviewing the possibility of IronPDF automatically defaulting to utf-16 encoding if no other encoding is specified, to help alleviate these kinds of issues.

answered Aug 25 '21 at 04:25

darren

475
4
15

2

If no utf characters are found before 65,535 bytes.... Chrome assumes Ascii encoding unless or is used as required. – Stephanie Oct 01 '21 at 09:03
In the end, we stuck with inserting a byte-order mark (\uFEFF) at the beginning of the string vs. . Aside from the latter's intent being a bit more obvious, is there any downside to using the BOM this way? – Rich Armstrong Oct 20 '21 at 17:10

IronPDF EAP doesn't interpret C# string as UTF-16

2 Answers2