19

I understand iTextSharp can be used for converting a document to pdf.

But first we have to create a document from scratch using iTextSharp.text.Document and then adding elements to this document.

What if I have an existing doc file, is it possible to convert this document to pdf using iTextSharp.

Also, I want to use iTextSharp or any similar tool which can perform following on a doc file:

  1. manipulation of doc/docx/text files (like replacing some placeholders with DB values) as well as
  2. converts them to .pdf

Anyone having idea about this, please share.

Thank you!

Charles Stewart
  • 11,661
  • 4
  • 46
  • 85
inutan
  • 10,558
  • 27
  • 84
  • 126
  • For maximum flexibility, you might consider separate "best-of-breed" solutions for each of the manipulation and conversion steps. That's the beauty of standard file formats (doc, docx). – JasonPlutext Feb 16 '15 at 23:53

6 Answers6

13

The Aspose.Words component can do this reliably (I'm not affiliated or anything).

iTextSharp does not have the required feature set to load and process MS Word file formats.

Lucero
  • 59,176
  • 9
  • 122
  • 152
  • 1
    Thank you all for your help. For my current scenerio, I will be using Aspose library to do doc/docx manipulation and then finally converting the result document to pdf after mail merge. I have downloaded the free 30-days trial version and it seems to solve all my issues. I would suggest anyone going to use Aspose to have the trial version first and then making the decision. – inutan Oct 23 '09 at 09:12
  • 1
    Aspose does nice OpenXml-to-PDF conversions in most cases, but be aware that it currently has poor or non-existent support for some Open XML features, such as content controls and AltChunk nodes. – Collin K Oct 25 '11 at 21:16
  • A recent alternative is my/Plutext's commercial docx to PDF converter; try it at http://converter-eval.plutext.com/ – JasonPlutext May 11 '16 at 22:48
  • 2
    Aspose is way too costly for a startup project. – Imran Faruqi Aug 21 '20 at 08:45
3

You can use existing method of Microsoft.Office

 private Microsoft.Office.Interop.Word.ApplicationClass MSdoc;

    //Use for the parameter whose type are not known or say Missing
    object Unknown = Type.Missing;

  private void word2PDF(object Source, object Target)
    {   //Creating the instance of Word Application
      if (MSdoc == null)MSdoc = new Microsoft.Office.Interop.Word.ApplicationClass();

        try
        {
            MSdoc.Visible = false;
            MSdoc.Documents.Open(ref Source, ref Unknown,
                 ref Unknown, ref Unknown, ref Unknown,
                 ref Unknown, ref Unknown, ref Unknown,
                 ref Unknown, ref Unknown, ref Unknown,
                 ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown);
             MSdoc.Application.Visible = false;
              MSdoc.WindowState =   Microsoft.Office.Interop.Word.WdWindowState.wdWindowStateMinimize;

            object format = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatPDF;

            MSdoc.ActiveDocument.SaveAs(ref Target, ref format,
                    ref Unknown, ref Unknown, ref Unknown,
                    ref Unknown, ref Unknown, ref Unknown,
                    ref Unknown, ref Unknown, ref Unknown,
                    ref Unknown, ref Unknown, ref Unknown,
                   ref Unknown, ref Unknown);
          }
           catch (Exception e)
          {
            MessageBox.Show(e.Message);
           }
         finally
          {
            if (MSdoc != null)
            {
                MSdoc.Documents.Close(ref Unknown, ref Unknown, ref Unknown);
                //WordDoc.Application.Quit(ref Unknown, ref Unknown, ref Unknown);
            }
            // for closing the application
            WordDoc.Quit(ref Unknown, ref Unknown, ref Unknown);
        }
    } 
Shyam sundar shah
  • 2,473
  • 1
  • 25
  • 40
  • 1
    Sure, just be aware that when working in a Server environment, this is not recommanded. See http://support.microsoft.com/kb/257757 – Daniel Sep 25 '13 at 10:44
  • It looks good as it is a ms office jar but I didn't get how to use this in my program.I couldn't get microsoft office jar. Can u please help me out. – Nikhitha Reddy Jan 23 '15 at 11:35
  • you would have to use some kind of COM interoperability library like Jacob to be able to handle those MS-Office objects directly from Java code (at least this was the case for JDK 1.5) - even then the code would look a bit uglier. judging from the "ref" and Interop library, the above code is written in C# – hello_earth Dec 08 '17 at 09:14
2

Aspose.Words is indeed a good solution, but it doesn't offer perfect fidelity. At the time of writing it has problems with non Roman languages, complex formatting such as floating elements and a number of other problems.

You may want to have a look at this PDF Conversion Web Service that can be used from any Web Services capable environment including Java and .NET.

Note that I worked on this project so the usual disclaimers apply.

Jeroen Ritmeijer
  • 2,772
  • 3
  • 24
  • 31
  • PDF Conversion Web Service just invokes Microsoft Word to convert documents to PDF. That is just Word Automation anyone can do that. – romeok Aug 17 '10 at 23:32
  • 4
    It does quite a bit more actually, but knowing who you are you are as biased as I am :-) Nice work on Aspose.Words, great product, I recommend it all the time. – Jeroen Ritmeijer Aug 18 '10 at 08:16
  • The price for your conversion service is $1500 for 1 server. @iniki might be better off with DynamicPDF Converter or Aspose for the same price or less, and it all runs in managed code and doesn't require Word to be installed or the management of a web service. To your point though, nothing will offer the conversion fidelity that interop word automation can. – MikeTeeVee Oct 22 '12 at 23:29
  • non-Latin or non-Roman, see http://simple.wikipedia.org/wiki/Roman_alphabet (Had to look it up as you made me doubt myself :-) – Jeroen Ritmeijer Jun 27 '13 at 09:06
1

If you do not care about whether the formatting will be faithful to what Word would display, there is the impressive docx2tex which converts Word 2007 docx files to Latex documents. Once in Latex, you have a lot of power to programmitically reformat the document, and generate PDF from it.

I say more about the utility in an answer at tex.stackexchange.  

Community
  • 1
  • 1
Charles Stewart
  • 11,661
  • 4
  • 46
  • 85
1

I do have the same issue.
After several days of trying to find a solution, it seems Docx4J , a Java-based tool, or PDF printers like PDFCreator, could be among the free solution.
For sure, just a commercial tool can effectively do the task requested.
On the Microsoft side, you could use server-side enabled Sharepoint Word Automation Services, ( check on 7 June 2016 ), or interop in your local computer.
The suggested part-to-part conversion ( DOC or DOC to some intermediate language and then to PDF ) it seems for, what users had said on stackoverflow or others forums, not possible, because result is not what expected.

CodeCaster
  • 147,647
  • 23
  • 218
  • 272
0

For docx manipulation, you should use native Open XML method. Download Open XML SDK 2 from Microsoft.

And then you can convert docx files to pdf with this paid library: http://www.subsystems.com/dpw.htm . It's really great.

mucit
  • 180
  • 1
  • 8
  • 1
    it is not great. Tried it, it changes the fonts everywhere and removes page formatting. – BuZz Apr 22 '13 at 12:49
  • It is *not* great, @Franklin. I agree. But once you get over the learning curve you can control the formatting, fonts, etc. That learning curve sucks, though – Rap Jan 17 '14 at 14:07