0

I am using GroupDocs Viewer to take a PDF and create a HTML page for it. I store the HTML in a nvarchar(MAX) field in my SQL2012 database. I just learned today during some testing that we're having some special characters (ligatures) in the documents not rendering correctly.

Question Marks are highlighted only to make them easy to find.

These Ligatures(fl, fi, ff) are missing for some reason. I checked my database and they seem to be stored there correctly.

I checked the JsonResult server method I'm using to load the page and I'm getting some mixed results trying to determine if my pageHtml string has the char.

public async Task<JsonResult> LoadDocument(int contentID, int page)
{
    try
    {
        var documentPageData = await APIAccess.Get<DocumentPageUserData>(DocumentPageData.GetDocumentPageRoute, contentID, CurrentUser.UserID, page);

        JsonResult result = new JsonResult()
        {
            ContentEncoding = Encoding.Default,
            ContentType = "application/json",
            Data = new
            {
                pageHTML = documentPageData.DocumentPage.PageHtml //.Replace("?", "fl").Replace("?", "fi").Replace("?", "ff")      //Don't like this idea
            },
            JsonRequestBehavior = JsonRequestBehavior.AllowGet,
            MaxJsonLength = int.MaxValue
        };

        return result;
    }
    catch (Exception ex)
    {
        return Json(string.Format("There was an error loading the page.\r\n\r\nDetails:\r\n{0}", ex.Message),
            JsonRequestBehavior.AllowGet);
    }
}

When I mouse over DocumentPage.Html and ask to render it as HTML, it looks great. The Text Render has a <span>?</span>however. Not sure if that's just because the Text Render doesn't have a font or if there is another problem.

On the Client side I store the html text in session storage until the page is requested then I render it into a div like so.

 var externalHtml = sessionStorage.getItem(currentPage);
 $('.viewer').text('');
 $('.viewer').append(externalHtml);

I've tried checking the network traffic and the client side html but it looks like it has ? so I'm not sure where I'm loosing my characters. Any ideas?

Jon Dosmann
  • 667
  • 7
  • 20
  • It looks like the preview inside your IDE is using a different font than your output (serif vs. sans-serif, respectively). What font should the output be using? Is it readily available on your PC or included in the CSS? – J. Titus Jun 13 '17 at 19:31
  • Good Catch. I'm not sure Group Docs Viewer is embedding the fonts when it creates the html from the pdf. My PDF is sans-serif, but I can't tell if the font matches exactly to what I'm rendering incorrectly. Problem is I just can't pick a specific font and include it since I need to have the font that the user used in the PDF. – Jon Dosmann Jun 13 '17 at 19:34
  • checking the network tab for that area and the span looks like \u003cspan \u003e?\u003c/span\u003e Seems to imply it's a question mark before trying to render it. Maybe the server needs the font? – Jon Dosmann Jun 13 '17 at 19:54
  • Maybe that, or the network tab doesn't have the correct font to render it either. Can you get the output as a byte stream to confirm; maybe through Wireshark? – J. Titus Jun 13 '17 at 19:59
  • Bytes: ef ac 81. That's what's between . Wireshark is rendering it as .... I looked it up and a period is a non-printable byte. – Jon Dosmann Jun 14 '17 at 13:46

1 Answers1

1

The JsonResult was not being encoded properly. I changed ContentEncoding = Encoding.Default to ContentEncoding = Encoding.UTF8. After that it rendered perfectly. Sigh... Been working on this for 2.5 days.

Jon Dosmann
  • 667
  • 7
  • 20