4

I am using VC++ 2010 Express and I am attempting to include HTML Tidy to perform cleanup on HTML code strings. What I want to do is process the HTML as a string (NOT from a file) and save the processed cleaned HTML to a string (NOT to a file). The project is a C++ Windows forms project, the compiler is /CLR.

I have attempted, more times than I care to admit, to attach Tidy to my project in various ways. I have failed in every attempt and I'm just not sure where to go from here. The most promising was a .NET wrapper called TidyManaged, but I could not find any documentation to explain how to use it with C++ (it appears to have been meant for C#). The various C++ wrappers are not working for me at all. It seems the documentation is extremely lacking on how to make them work.

Also I am prepared to accept a solution that does not use tidy at all, but some other equivalent HTML cleanup tool. I am concerned about the age of Tidy (August, 2000) and whether it is still effective for today's newer XHTML standards.

Also if it's possible, I am willing to incorporate a C library into my code directly without relying on a DLL, but I have no knowledge on how to make this work or even if it can work.

Any suggestions on how to go about this would be greatly appreciated, keeping in mind that this is HTML we are talking about here (often times malformed HTML and XHTML) and NOT XML.

Thanks in advance!

PS - I am new to C++ :/

ildjarn
  • 62,044
  • 9
  • 127
  • 211
Jason
  • 236
  • 1
  • 3
  • 9
  • why don't you just include it as a file in the resources? – Daniel A. White Sep 01 '11 at 17:25
  • How would I go about doing that? Also, include what? The DLL or the c code files? – Jason Sep 01 '11 at 17:31
  • HTML Tidy was last updated [2009-03-25](http://tidy.cvs.sourceforge.net/tidy/tidy/src/version.h?view=markup), not in 2000... – ildjarn Sep 01 '11 at 17:36
  • @ildjarn - I hadn't found that. Thank you, I just downloaded the tarball. Any chance you can tell me how to use it? – Jason Sep 01 '11 at 17:45
  • @Jason : That's a pretty broad question. I can certainly tell you that it _can_ be used painlessly from C++/CLI, as I've had to do so on a handful of occasions, but the 'how' is mostly C++ 101 stuff. If you're new to C++, go through a decent book or two before trying to mix C, C++, and C++/CLI all at once like this. – ildjarn Sep 01 '11 at 17:59
  • @ildjarn - How did you accomplish it? Did you build tidy into a dll or static library and then link to it, or did you incorporate it in some other way? – Jason Sep 01 '11 at 18:09
  • @Jason : In one case it was built into a .dll, in every other case it was built into a static library. I.e., it will work either way as long as the CRT is linked in dynamically. – ildjarn Sep 01 '11 at 18:31
  • @ildjarn - I don't suppose you could provide some more detail on this? Source code, a link to something, anything? – Jason Sep 01 '11 at 18:43
  • @Jason : Not to existing code, no, as it was done for billed contracts; however, the process was merely porting the makefile into a VC++ project. Again, this is probably biting off more than you can chew if you're new to C++ -- read a decent C++ book or two first and you should come to understand the C++ compilation/linking model well enough to take this on. – ildjarn Sep 01 '11 at 18:48
  • @ildjarn - What about building a static library? Is that a reasonable option? – Jason Sep 01 '11 at 19:03
  • @Jason : Indeed, very reasonable, but the same process is required (porting the makefile into a VC++ project). – ildjarn Sep 01 '11 at 19:05
  • @ildjarn - I found this http://www.codeproject.com/KB/cs/ZetaHtmlTidy.aspx. I managed to build the DLL in VC++ 2010 and it appears to be good. The example code provided is C#. Any idea how to call this with C++ instead? The wrapper is written in C++. – Jason Sep 01 '11 at 19:41
  • @Jason : I'd be wary of that -- last updated in 2007, and most likely abandoned now. – ildjarn Sep 01 '11 at 19:57
  • @ildjarn - Unfortunately for me, beggars can't be choosers! :/ See my solution below... – Jason Sep 01 '11 at 20:04
  • Ok, I guess I'll have to wait some time to post my solution, but at least I'm off to the races! :) – Jason Sep 01 '11 at 20:10

1 Answers1

3

It's been almost 48 hours struggling with this problem. Solution discovered! Here it is...

Using the very simple .NET wrapper from here http://www.codeproject.com/KB/cs/ZetaHtmlTidy.aspx converted the VC project to VC++ 2010 ok and compiled as a DLL ok. Below is the code I used to call it:

System::String^ TidyMyHTML(String^ MyHTMLString)
{
    using namespace ZetaHtmlTidy;
    HtmlTidy tidy;
    String^ s = tidy.CleanHtml( MyHTMLString, HtmlTidyOptions::ConvertToXhtml );
    return s;
}

Hopefully this post will spare someone else from going through the same thing.

EDIT:

Taking this a step further I was able to convert the VC++ 2008 project files from the tidy source attached to the wrapper and upgrade them to VC++ 2010 project files. I was then able to compile the tidy project (separate from his wrapper class project) into libtidy.lib static libraries (both release and debug). I was then able to incorporate his wrapper class into my application and point to the include and lib files. The end result was exactly what I wanted, a solution that incorporates tidy into my application without needing to have a dll dependency. This whole experience has accelerated my learning curve for attaching C libraries to my C++ applications.

Thanks for the suggestions, and I hope someone finds this post useful.

Jason
  • 236
  • 1
  • 3
  • 9