0

I convert document from html to pdf using Jtidy and java the problem that when I convert the pdf , the style was not applied to the document . when I try other solutions (Jsoup , HTMLworker , xmlWorker ) the document was malformed also .

    private static ByteArrayOutputStream html2Xhtml(String html) {
    Tidy tidy = new Tidy();
    tidy.setShowWarnings(false);
    tidy.setXmlTags(false);
    tidy.setInputEncoding("UTF-8");
    tidy.setOutputEncoding("UTF-8");
    tidy.setXHTML(true);
    tidy.setMakeClean(true);
    tidy.setForceOutput(true);
    ByteArrayOutputStream fileFit = new ByteArrayOutputStream();
    tidy.parseDOM(new ByteArrayInputStream(html.getBytes()), fileFit);
    return fileFit;
}

public @ResponseBody byte[] SavePdf(
        @RequestParam("documentContent") final Object documentContent)
        throws  IOException {
    ByteArrayOutputStream file = new ByteArrayOutputStream();
    ITextRenderer renderer = new ITextRenderer();
    String xhtml = html2Xhtml(documentContent.toString()).toString();
    renderer.setDocumentFromString(xhtml);
    renderer.layout();
    try {
        renderer.createPDF(file);
    } catch (com.lowagie.text.DocumentException e) {
        log.info(e.getMessage(), e);
        e.printStackTrace();
    }
    return file.toByteArray();
}

this is a part of html code :

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head>
<meta name="generator" content="HTML Tidy for Linux (vers 25 March 2009), see www.w3.org">
<style type="text/css">
/*<![CDATA[*/
@media screen , print {
        html, body, div, span, applet, object, iframe, h1, h2, h3, h4, h5, h6, p,
                blockquote, pre, a, abbr, acronym, address, big, cite, code, del, dfn,
                em, img, ins, kbd, q, s, samp, small, strike, strong, sub, sup, tt,
                var, b, u, i, center, dl, dt, dd, ol, ul, li, fieldset, form, label,
                legend, table, caption, tbody, tfoot, thead, tr, th, td, article,
                aside, canvas, details, embed, figure, figcaption, footer, header,
                hgroup, menu, nav, output, ruby, section, summary, time, mark, audio,
                video {
                margin: 0;
                padding: 0;
                border: 0;
                font-size: 100%;
                font: inherit;
                vertical-align: baseline;
                position: static;
        }
        article, aside, details, figcaption, figure, footer, header, hgroup, img,
                menu, nav, section {
                display: block;
        }
        body {
                line-height: 1;
        }
        ol, ul {
                list-style-type: circle;
                list-style-position: inside;
        }
        a {
                color: #009ddc;
                text-decoration: none;
        }
        table {
                border-collapse: collapse;
                border-spacing: 0px;
                border-width: 0px;
                width: 100%;
        }
        td[colspan="4"], .allergiesSection td:first-child, .problemsSection td:first-child,
                .activeProblemsSection td:first-child, .medicationsSection td:first-child,
                .immunizationsSection td:first-child, .vitalsSection td:first-child,
                .vitalsSection th:first-child, .proceduresSection td:first-child,
                .careTeamSection td:first-child {
                font-weight: bold;
        }
        .divider {
                border-top: 1px solid #ddd;
                color: #444;
        }
} /*]]>*/
</style>
<title></title>
</head>
<body>
</body>
</html>
Ali
  • 1
  • 1
  • 4
  • 1
    I see that you use `com.lowagie`, which means iText 2.1.7 or earlier. Please upgrade to iText 5.5.10 and use `XMLWorker` to convert HTML to PDF (simplest upgrade path). Or upgrade to iText 7.0.2 and use the pdfHtml add-on (even better HTML to PDF performance, but requires more work at your end to make your code compatible). – Amedee Van Gasse Mar 15 '17 at 12:28
  • I attempt to use XMLWorker with iText 5.5.10 but when I convert the document to pdf . I obtain a Malformed document . – Ali Mar 15 '17 at 13:17
  • Please add your HTML to your question. – Amedee Van Gasse Mar 15 '17 at 13:29
  • Please edit your post. Your HTML is not properly formatted. – Amedee Van Gasse Mar 16 '17 at 06:51
  • My HTML is very long – Ali Mar 16 '17 at 08:16
  • For better readability, I replaced your HTML with a cleaned up version, as generated by https://validator.w3.org/, which uses HTML-Tidy. – Amedee Van Gasse Mar 16 '17 at 08:36
  • I used the Tidy to convert the document from html to xhtml , the problem persist when converting html to pdf , the layout is not applicated . – Ali Mar 16 '17 at 09:21
  • Edit your question with the code that you currently use, that is, with `xmlworker`. Make it so that anyone can just copy/paste into their IDE and run it. See https://stackoverflow.com/help/mcve – Amedee Van Gasse Mar 16 '17 at 10:08

0 Answers0