6

I want to generate a PDF document from a "raw" email. This email could containt html or just text. I don't care for attachments.

The resulting pdf should contain the proper formatting (from css and html) and also embedded images.

My first idea was to render the email using an email client like thunderbird and then print it to pdf. Does thunderbird offer such an API or are there java libraries available to print an email to pdf?

Nick Russler
  • 4,608
  • 6
  • 51
  • 88

6 Answers6

5

I've found a better solution to the one I posted before. saving the email to html, then use jtidy to clean it up to xhtml. and lastly use flying saucer html renderer to save it into pdf.

Here is an example I wrote:

import com.lowagie.text.DocumentException;
import org.w3c.tidy.Tidy;
import org.xhtmlrenderer.pdf.ITextRenderer;
import java.io.*;
import java.util.*;
import javax.mail.*;

public class Email2PDF {

public static void main(String[] args) {

    Properties props = new Properties();
    props.setProperty("mail.store.protocol", "imaps");
    try {
        Session session = Session.getInstance(props, null);
        Store store = session.getStore();
        //read your latest email
        store.connect("imap.gmail.com", "youremail@gmail.com", "password");
        Folder inbox = store.getFolder("INBOX");
        inbox.open(Folder.READ_ONLY);
        Message msg = inbox.getMessage(inbox.getMessageCount());
        Multipart mp = (Multipart) msg.getContent();
        BodyPart bp = mp.getBodyPart(0);
        String filename = msg.getSubject();
        FileOutputStream os = new FileOutputStream(filename + ".html");
        msg.writeTo(os);
        //use jtidy to clean up the html 
        cleanHtml(filename);
        //save it into pdf
        createPdf(filename);
    } catch (Exception mex) {
        mex.printStackTrace();
    }
}

public static void cleanHtml(String filename) {
    File file = new File(filename + ".html");
    InputStream in = null;
    try {
        in = new FileInputStream(file);
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }
    OutputStream out = null;
    try {
        out = new FileOutputStream(filename + ".xhtml");
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }
    final Tidy tidy = new Tidy();
    tidy.setQuiet(false);
    tidy.setShowWarnings(true);
    tidy.setShowErrors(0);
    tidy.setMakeClean(true);
    tidy.setForceOutput(true);
    org.w3c.dom.Document document = tidy.parseDOM(in, out);
}
public static void createPdf(String filename)
        throws IOException, DocumentException {
    OutputStream os = new FileOutputStream(filename + ".pdf");
    ITextRenderer renderer = new ITextRenderer();
    renderer.setDocument(new File(filename + ".xhtml"));
    renderer.layout();
    renderer.createPDF(os) ;
    os.close();
    }
}

Enjoy!

liorsolomon
  • 1,785
  • 14
  • 24
  • Awesome! In the meantime i had the same idea to use the way over html, does your solution account for the embedded images? If not this should be easy to do "by hand" using base64 embedded images in html. Also i will need to look out for "multipart/alternative" emails. I will hopefully post my solution in a few days :) – Nick Russler Dec 05 '14 at 16:38
  • Yes it does support images. good luck – liorsolomon Dec 05 '14 at 16:41
4

I put a piece of software together that converts eml files to pdf's by parsing (and cleaning) the mime/structure, converting it to html and then use wkhtmltopdf to convert it to a pdf file.

It also handles inline images, corrupt mime headers and can use a proxy.

The code is available at github under apache V2 license.

Nick Russler
  • 4,608
  • 6
  • 51
  • 88
  • 1
    Thanks Nick , your software is very useful. However I am curious; why does it need an http proxy value? – codemonkey Jul 26 '17 at 16:41
  • @codemonkey Emails can contain external resources, such as images. The use of a proxy to download these resources is an optional feature I added. – Nick Russler Jul 31 '17 at 12:09
1
import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;

import java.io.FileOutputStream;
import java.io.IOException;
import java.util.*;
import javax.mail.*;

public class Email2PDF {

    public static void main(String[] args) {

        Properties props = new Properties();
        props.setProperty("mail.store.protocol", "imaps");
        try {
            Session session = Session.getInstance(props, null);
            Store store = session.getStore();
            store.connect("imap.gmail.com", "youremail@gmail.com", "password");
            Folder inbox = store.getFolder("INBOX");
            inbox.open(Folder.READ_ONLY);
            Message msg = inbox.getMessage(inbox.getMessageCount());
            Multipart mp = (Multipart) msg.getContent();
            BodyPart bp = mp.getBodyPart(0);
            createPdf(msg.getSubject(), (String) bp.getContent());
        } catch (Exception mex) {
            mex.printStackTrace();
        }
    }

    public static void createPdf(String filename, String body)
            throws DocumentException, IOException {

        Document document = new Document();
        PdfWriter.getInstance(document, new FileOutputStream(filename + ".pdf"));
        document.open();
        document.add(new Paragraph(body));
        document.close();
    }

}

I've used itext as the pdf library

liorsolomon
  • 1,785
  • 14
  • 24
  • Thanks! I love itext, it is my favorite java pdf lib. But as far as i can see from the code, this would simple put the String of the Body into the pdf. My main problem is rendering the email (html, css, embedded images) which is why i wanted to use an existing email client and "misuse" it. – Nick Russler Dec 05 '14 at 14:37
  • Now I get it :) in that case I would consider implementing the email parsing code using this example : http://www.tutorialspoint.com/javamail_api/javamail_api_fetching_emails.htm check the writePart method. Though I guess it still won't be that easy, you will have to handle the stylesheet etc. – liorsolomon Dec 05 '14 at 14:55
  • Nick how about converting the html2image and then converting to pdf? I think that would be the easiest solution unless you need to provide search capabilities on the pdf. – liorsolomon Dec 05 '14 at 15:16
  • That sounds like a great solution, but the problem ist the same. How to render the email e.g. as eml file (to an image)? – Nick Russler Dec 05 '14 at 15:20
0

You can read HTML content using email client and then use iText to convert it in to PDF

kamoor
  • 2,909
  • 2
  • 19
  • 34
0

Look into fpdf and fpdi, both free libraries for PHP are used in the creation of PDF docs.

Since the SMTP protocol has conventions, actually strict rules, you can always count on the first empty line to be the before the content of the message. So you can definitely parse everything after the first part of the line to get the entirety of the message.

For embedded images, you'll need a base 64 decoder (usually) or some other decoder based on the email's attachment encoding type to transform the data into a human readable image.

MiiinimalLogic
  • 820
  • 6
  • 11
0

You could try the Apache PDFbox library.

It seems to have a nice API and it also supports printing. PrintPDF

You would have to run the print command from CLI with your file as a parameter.

Edit: It is Java and open-source.

Hope it helps!

Tilman Hausherr
  • 17,731
  • 7
  • 58
  • 97
thomas77
  • 1,100
  • 13
  • 27
  • Thanks for the suggestion, i used pdfbox before. The main problem is not generating the pdf, but to parse the email and render it propertly. – Nick Russler Dec 05 '14 at 14:25
  • You would probably need some sort of HTML parser like [Cobra](http://lobobrowser.org/cobra/java-html-parser.jsp). I have not used it myself, so I don't know if it's good match for your problem. – thomas77 Dec 05 '14 at 14:48