PrinceXML: "Input is not proper UTF-8"

Question

I'm generating HTML from a database and then sending it to PrinceXML for conversion to PDF. The code I use to do this is:

string _htmlTemplate = @"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Transitional//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd""><html lang=""en-GB"" xml:lang=""en-GB"" xmlns=""http://www.w3.org/1999/xhtml""><head><meta http-equiv=""Content-type"" content=""text/html;charset=UTF-8"" /><title>Generated PDF Contract</title></head><body>{0}</body></html>";

string _pgeContent = string.Format(_htmlTemplate, sb.ToString());
writer.Write(sb.ToString());
Byte[] arrBytes = UTF8Encoding.Default.GetBytes(_pgeContent);
Stream s = new MemoryStream(arrBytes);

Prince princeConverter = new Prince(ConfigurationManager.AppSettings["PrinceXMLInstallLoc"].ToString());
princeConverter.SetLog(ConfigurationManager.AppSettings["PrinceXMLLogLoc"]);
princeConverter.AddStyleSheet(Server.MapPath(ConfigurationManager.AppSettings["FormsDocGenCssLocl"]));
Response.ClearContent();
Response.ClearHeaders();
Response.ContentType = "application/pdf";
Response.BufferOutput = true;

However, conversion fails with the error:

Input is not proper UTF-8, indicate encoding ! Bytes: 0xA0 0x77 0x65 0x62

I've taken the generated html and uploaded it to the W3C validator. It validates the markup as UTF-8 encoded XHTML 1.0 Transitional with no errors or warnings.

I've also gone through the file with a fine tooth-comb looking for invalid characters. So far nothing.

Can anyone suggest something else I could try?

Yes, convert the stream to UTF-8 as suggested by the error message. — Darin Dimitrov, Nov 17 '10 at 12:06
@DarinDimitrov Doesn't the fact that the W3c validator parses it as valid UTF-8 encoded XHTML mean that it *is* UTF-8 ? Or am I missing something ...? — immutabl, Nov 17 '10 at 12:18

score 2 · Accepted Answer · answered Nov 18 '10 at 09:28

Well after an afternoon of muttering curses and tearing out what is left of my hair, I figured out a fix for my particular problem.

It would appear that System.Text.UTF8Encoding doesn't output a UTF-8 identifier byte by default. So in my case I needed to use the constructor that takes a boolean parameter to control output of this.

UTF8Encoding u8enc = new UTF8Encoding(true);//Ensures a UTF8 identifier is emitted.

After this it was all good. Hope this helps someone :-)

A little irrelevant, but does Prince support the XHTML transitional doctype you used up there? — user961627, Jan 28 '13 at 15:25

PrinceXML: "Input is not proper UTF-8"

1 Answers1