3

I have been trying to find a free (preferably open sourced) component or library which will allow to convert a RTF file with embedded images into HTML file and image files or better HTML and image streams.

The perfect solution, regardless if it is a DLL library or Delphi component, would allow to stream data to IStream/TStream using callbacks, so I will be able to convert and save images into a format of choice returning image file relative name for RTF parser to include in generated HTML file, yet saving as-is is also good especially when code would be open sourced.

I have came across commercial solutions yet I struggle to consider them because prices for a (relatively) simple conversion of one document type into another are quite high and both formats are 20 years old which suggests there must be existing library (native, not managed) to make such conversion.

If I won't find a solution, I will probably convert this code into Delphi dll and make it available, but maybe someone already did it?

EDIT:

We've decided to use aforementioned .Net RtfConverter compiled as a DLL, generate Delphi TLB unit from it and force customers to install .Net framework (embedded in installer). Now conversion works like a charm, another sign it's time to move on to .Net from Delphi...

too
  • 3,009
  • 4
  • 37
  • 51
  • 1
    The RTF spec isn't simple and the conversion isn't as straightforward as it may seem. Actual RTF documents are more complicated than brief text-snippets with some bolding and italics. Consider Unicode and localization and multiple fonts and CSS and headers and footers and paragraph formatting and tables and nested tables etc etc. Not all but some of the commercial software tools that convert RTF into HTML are worth what they charge. I haven't written one but have used one that costs $499 and it is now available in a managed-code version (100% C#). – Tim Jan 12 '11 at 10:39
  • Maybe simple is not a correct description for such conversion, yet RTF tag list is quite limited and both Windows and Delphi have facilities to work on international characters. Saving HTML file in unicode with < and > characters escaped is also a possibility I consider. Library which I have mentioned in question is a working solution in managed C# code using which I am trying to avoid to keep application dependencies as low as possible. – too Jan 12 '11 at 10:50
  • Prices for commercial RTF to HTML converters start at around $130 (no royalty fees) - including full Delphi source code – mjn Jan 12 '11 at 11:57
  • mjn: would you like to post a reference to this commercial Delphi converter? If a free open source Delphi/C++ converter does not exist and converting http://www.codeproject.com/KB/recipes/RtfConverter.aspx into Delphi will be problematic, I would definitely consider it as a last resort. – too Jan 12 '11 at 14:05

3 Answers3

3

If you COULD use microsoft office to open the RTF and then save it as HTML in the background, then I believe this is your best solution, just fire a Microsoft Word instance in the background using OLE, load the RTF and then export it as HTML...

  • 2
    Unfortunately this requires having a commercial product worth few hundred USD/GBP/... installed on a machine. – too Jan 12 '11 at 10:30
  • I agree 110% on this, however most users tend to have Office installed therefore if your target is a specific client and he has the Office suite then I suggest going with this, otherwise you will have to search more and resort to implementing it yourself... –  Jan 12 '11 at 10:44
  • @PA depending on too's deadline and other stuff, you can call it "two problems" or a (permanent/temporar)"win" –  Jan 12 '11 at 16:26
  • I might assume customers have Office suite installed yet for this conversion dependency on Office regardless of guaranteed compatibility and quality of conversion (if you can consider HTML generated by Word as compatibile and good quality) might be seen as unnecessary complication and presumably an alternative native code converter should exist after 2 decades of existence of both formats. It might also be a RTF -> DOC -> HTML path as DOC is more popular. – too Jan 12 '11 at 17:00
  • 1
    @too you might want to look at Open Office suit as well, it might have some *.DLL's which can do the conversions –  Jan 12 '11 at 17:19
3

A commercial converter for RTF to HTML 4.01 / HTML5 and RTF to various flavors of XHTML is ScroogeXHTML for Delphi. Version 5.0 included improved picture support, with example code for WMF to PNG conversion. (I am the developer of this component and its counterpart for the Java platform).

mjn
  • 36,362
  • 28
  • 176
  • 378
  • Thank you for the link. I shall wait a little longer for possible free solution as asked in question with fallback to ScroogeXHTML as it looks promising. Do you have any plans to include table/lists support? – too Jan 13 '11 at 09:15
  • Simple numbered and unnumbered lists are supported, tables however do not fit well in the internal intermediate document representation and will require a major redesign (but it is under consideration) – mjn Jan 13 '11 at 09:52
-2

P.S: I'm a developer of this product.

This is commercial .Net library to convert RTF to HTML 3.2, 4.01, XHTML 1.01 and HTML 5. It

supports converting with tables and nested tables, ordered and bulleted lists, images embeded in HTML, Unicode, special HTML symbols etc.

This is a sample code in C#:

        SautinSoft.RtfToHtml r = new SautinSoft.RtfToHtml();
        r.OutputFormat = SautinSoft.RtfToHtml.eOutputFormat.HTML_5;
        r.ImageStyle.IncludeImageInHtml = true;
        r.ConvertFile(@"d:\document.rtf",@"d:\html5.htm");