7

I need to convert a html file to doc. I am using html2pdf for pdf conversion.

Is there is any same kind of library for html2doc?

(PS must be free/open source)

EDIT

After Mark Eirich comment..

Here are two screenshots. Word document is not proper aligned. Check y-scroll in word document. WORD document, check y scroll..

html file, on browser..

Body tag is:--

<body style="margin-left:350px; margin-right:350px;">

I tried to adjust it but no effect..

EDIT 2

after Mark Eirich second comment i came to know word is taking things in pixel not in %age.. I am having last issue of back ground.. Any help.. please check the two screen shots. The difference is outer box. and thats y html generated doc is looking odd.

Original word file

html generated doc file

Mohit Jain
  • 43,139
  • 57
  • 169
  • 274
  • Microsoft Word can read HTML without any conversion. Just end the filename with .doc and done. – Mark Eirich Feb 12 '11 at 00:43
  • @Mark Eirich its not working in a proper way.. check edit part... – Mohit Jain Feb 12 '11 at 13:33
  • @piemesons: Sadness. I don't have much experience with Word. However, try this (1) set a width on the body, (2) set your margins in inches "in" or centimeters "cm", or (3) add an additional wrapper inside of body, and set margin/width on it; perhaps Word ignores the body tag. You could also try generating HTML in Word, and then viewing it to see what Word is doing to set margins. – Mark Eirich Feb 12 '11 at 13:41
  • @Mark Eirich Cool.. ur idea worked.. actually word take things in pixel instead of %age and fixing this it worked.. but i m just having last issue.. will u please check this.. Any idea... i will be thankfull to u.. PLease check edit part 2 – Mohit Jain Feb 12 '11 at 20:55
  • my Solution would be to create my own, but to be honest, Microsoft alwasy over complicates things: http://msdn.microsoft.com/en-us/library/cc313153(v=office.12).aspx and by our very own Joel http://www.joelonsoftware.com/items/2008/02/19.html – RobertPitt Feb 12 '11 at 21:06
  • @piemesons: Unfortunately, imgur.com is blocked by my porn filter. Someone else will have to help you, or you can try posting the images somewhere else, like http://tinygrab.com/ Also, you may want to post your HTML/CSS somewhere so I can try it myself. Also, you have not told us what version of Word you are working with. – Mark Eirich Feb 13 '11 at 04:22

2 Answers2

6

The answer IMO Would be no, For the following reasons:

Microsoft Office Documents are extremely complex in the way they are designed, there not just a formatted file with references to objects such as images, there is a type od file system within itself to manage the binary data of these objects.

Let me bring in a quote from our very own Joel:

If you started reading these documents with the hope of spending a weekend writing some spiffy code that imports Word documents into your blog system, or creates Excel-formatted spreadsheets with your personal finance data, the complexity and length of the spec probably cured you of that desire pretty darn quickly. A normal programmer would conclude that Office’s binary file formats:

  • are deliberately obfuscated
  • are the product of a demented Borg mind
  • were created by insanely bad programmers
  • and are impossible to read or create correctly.

You’d be wrong on all four counts....

Read further down for a possible solution:

If you really want to generate fancy formatted Word documents, your best bet is to create an RTF document. Everything that Word can do can be expressed in RTF, but it’s a text format, not binary, so you can change things in the RTF document and it’ll still work. You can create a nicely formatted document with placeholders in Word, save as RTF, and then using simple text substitution, replace the placeholders on the fly. Now you have an RTF document that every version of Word will open happily.

@source: http://www.joelonsoftware.com/items/2008/02/19.html

Some links that may interest you along your journey:

Although, Try opening a word file with winrar ;), Maybe creating an archive with certain headers and then changing the extenstion will suffice, Never Tried

Community
  • 1
  • 1
RobertPitt
  • 56,863
  • 21
  • 114
  • 161
  • 3
    my thinking is because someone gave a link to a working solution and you said it's not possible. Not worth a downvote though since you gave a pretty good discussion and I liked reading Joel's input on the matter. – ajacian81 Mar 17 '12 at 23:05
2

in order to convert to Microsoft Word you need an COM enabled server (running Windows and Office on it). If you have such a server

$word = new COM("word.application") or die ("couldnt create an instance of word"); 

should work!. Read http://php.net/manual/en/book.com.php for details.

Otherwise your best shot at html2doc is html2rtf which is achieved with a library such as http://paggard.com/projects/rtf.generator/ or http://sourceforge.net/projects/phprtf/.

Then after you create the RTF you serve it to the browser with a doc header

header("Content-type: application/vnd.ms-word");
header("Content-Disposition: attachment;Filename=document_name.doc");

If the user has word then it will be open to handle the file.

Also saving an rtf as doc is ok and word will open in layout view without any complaints. You can also serve HTML with the above header but the problem is that Word will open in web view and that is bad :)

Andrei Draganescu
  • 387
  • 1
  • 2
  • 10
  • 4
    He asked for a solution that is free/open source! – Mark Eirich Feb 12 '11 at 00:45
  • What is "web view" and why is it bad? (I'm not very familiar with Word.) – Mark Eirich Feb 12 '11 at 00:46
  • 1
    phprtf is open source, both GPL and LGPL, and it's actually put together well, it's heaven compared to the other php html2rtf libraries, well, those that are html2rtf technically in name only. – asnyder Feb 16 '11 at 06:31
  • **JUST DON'T DO THIS.** Office applications are not designed or licensed to be used like this on web servers. You will end up in a world of pain with orphaned word/excel processes and much much more. – Kev Sep 23 '14 at 11:48