1

I have an ASP.net application that uses wkhtmltopdf to generate a PDF file from HTML input. It works beautifully and everything, including HTML tables, displays perfectly in the PDF file.

The issue is that the recipients of the file need to copy the table from the PDF and paste in to Word. When they try to do that, the text is carried over but without any sort of table formatting - obviously this is not good.

I've searched and searched for more info on this. Some people hint that there's a way to mark-up tables in PDFs such that the formatting will be copied along with the text but I can't find more detail on it. Has anyone had experience of doing this?

If there's not a way to generate the PDF with enough info, does anyone know of a way to generate an uneditable word document in ASP.net from a HTML source?

Thanks!

  • Why uneditable, yet copyable? Couldn't they just copy and paste into an editable document? – mbeckish Oct 11 '13 at 16:37
  • @mbeckish - that's a fair point. I've started looking at using HtmlToOpenXml and then using that output as the word document. It's not perfect by any means and requires some work to get inline CSS but it's a start. I'd still rather figure out how to properly tag the table in PDF but that's looking like a remote possibility. – John Griffiths Oct 11 '13 at 19:49
  • So why don't the users copy the table from an HTML page? Is it because these users don't visit your ASP.NET site? – mbeckish Oct 11 '13 at 19:55
  • @mbeckish - yes exactly, the final recipients don't have access to our site so we need to provide them with a static file that they can work with (they will receive it via email) – John Griffiths Oct 11 '13 at 20:05
  • 1
    PDF as a source is very unreliable in this sort of thing as far as I have seen, how about an alternative format for the file? Or providing two copies of the file, one PDF and one word template with the table? – Joel Peltonen Oct 12 '13 at 09:31
  • 1
    @Nenotlep Yes, I feel that that is probably the route I'm going to have to take. cheers. – John Griffiths Oct 13 '13 at 04:37

1 Answers1

0

I don't think wkhtmltopdf is the right tool for this job. For the most part it uses a black box rendering, and you're not going to be able to affect things to a very granular level. I'd recommend using a different component like Aspose.Words/Aspose.pdf which gives you much more control over how the elements are rendered. These components will also take HTML as input too.

Also have a look at this article which describes a end user method for manually copying and pasting PDF tables to word. Hopefully this helps.

James
  • 12,636
  • 12
  • 67
  • 104
  • 1
    Thanks James, unfortunately the Aspose option is most likely out of our price range - and I also still need to determine if there's even anything that can be done at the PDF rendering level to help word recognise the table when pasted. That article is handy but our users are far too lazy to do all that themselves :) – John Griffiths Oct 11 '13 at 17:57