-1

Is it possible to convert a doc file to a pdf file programmatically, with out using WORD application/third party tools. Preferably in Delphi XE4. If so, how?

Kara
  • 6,115
  • 16
  • 50
  • 57
fr21
  • 1,746
  • 7
  • 26
  • 45
  • 1
    Well, if you rule out Word and 3rd party tools, what do you suppose would have the functionality to do this? – MartynA Jan 06 '14 at 13:13
  • Actually I think he means "stand-alone 3rd-party tools" like OpenOffice instead of MSO. If he wants his code 100% written by him - then what is the point of asking on Stack Overflow ? Any our code that we could show him would be the same "3rd party" like any ready library/component – Arioch 'The Jan 06 '14 at 13:57
  • Yes, it is perfectly possible. But it would take a very good programmer a very long time to do it properly, I think. You might be a very, very good programmer with a lot of spare time, but if not, you should consdier using a third-party library for this. – Andreas Rejbrand Jan 06 '14 at 18:09

1 Answers1

6

Yes, you can convert .doc/.docx files to .pdf without Word and without third-party controls. The specifications are publically available - [simply] read and parse the .doc/.docx file according to the specification and generate the content according to the .pdf specification.

Here is the specification for MS-DOC (.doc) file format :

MS-DOC Specification (622 pages) -- Word97 through 2007

MS-DOCX Extensions Specification (105 pages) -- Word2010 through 2013

See also - Open Document and OpenXML Format

And the specification for the .pdf format :

PDF Reference (1310 pages)

Really, I think you'll find you probably want to use a third party component...

J...
  • 30,968
  • 6
  • 66
  • 143
  • 2
    Dunno for Word, but Excel 2010 is not equal to XLSX specifications. In some cases it just contradicts it. Also DOCX/XLSX do not specify print settings like paper size, margins, etc. So it won't be exported to PDF as well. Which defeats one of the design goals of PDF - uniformity of rendering anywhere. – Arioch 'The Jan 06 '14 at 13:55
  • 2
    This project will probably take you over 6 months full-time coding (and reverse-engineering) – Jan Doggen Jan 06 '14 at 13:57
  • @Arioch'The I think the core features are common to all office documents (Open Document Format .***x format) - the bits for each specific application (PPT, WORD, EXCEL) are just extensions to ODF. You may still be correct, however... MS doesn't exactly have the best record for sticking to its own standards. – J... Jan 06 '14 at 13:58
  • 2
    @J... No. OASIS OpenDocument and Microsoft OfficeOpenXML are very different. Even the splitting of information in the ZIP containers by files and folders is different. XML structures are different. Etc. So yes, finally in both ODS and XLSX you'd find "ABCDE" value of the cell. But in ODS it would be part of the sheet, while it XLSX it would be part of unified Strings Container and the sheet would reference the entry of it. Also working on custom XLSX generation I met cases when MS Excel refuses to load the file treated valid my MS XLSX Validator and vice versa. – Arioch 'The Jan 06 '14 at 14:14