1

I have a MHTML file and I am trying to convert it to HTML.

I have installed the HtmlAgilityPack and tried the following code:

var doc = new HtmlAgilityPack.MixedCodeDocument();
doc.Load("C:\\Users\\DickTracey\\Downloads\\Club Membership Report.mhtml");

var ms = new MemoryStream();
var sw = new StreamWriter(ms);

doc.Save(sw);
ms.Position = 0;

var sr = new StreamReader(ms);
return sr.ReadToEnd();

But it always returns null.

Can anyone explain the correct procedure to convert MHTML to HTML please?

Trevor Daniel
  • 3,785
  • 12
  • 53
  • 89
  • 2
    What makes you think that HtmlAgilityPack can read mhtml files? I can't see anything in the documentation that says it supports this. – Liam Apr 10 '14 at 10:17
  • Answers to questions like http://stackoverflow.com/questions/19086674/how-to-download-a-thousand-web-pages-to-mhtml-files or http://social.msdn.microsoft.com/Forums/en-US/44307106-c281-4805-a18c-eeddb43fa561/save-mhtml-file-as-html-file?forum=csharpgeneral imply that HtmlAgiltyPack is able to do this. – Adrian Wragg Apr 10 '14 at 10:21
  • 1
    No, they don't. They say you **could try** the HTML agility pack. – Liam Apr 10 '14 at 10:22
  • 1
    Also that SO question is reading HTML and saving MHTML, your trying to read MHTML. – Liam Apr 10 '14 at 10:31
  • 1
    The SO answer says "can", not "could". If you know it's wrong, then it's worth saying so there, to help other people avoid hitting the dead ends that OP here has hit. – Adrian Wragg Apr 10 '14 at 10:36

2 Answers2

1

MHTML to HTML Decoding in C#!

string mhtml = "This is your MHTML string"; // Make sure the string is in UTF-8 encoding MHTMLParser parser = new MHTMLParser(mhtml); string html = parser.getHTMLText(); // This is the converted HTML

git link : https://github.com/DavidBenko/MHTML-to-HTML-Decoding-in-C-Sharp.git

0

I had a quick look at an MHTML file with HxD. Although, as noted above, HtmlAgilityPack has little or no support for MHTML, the format itself looks simple enough. It appears to consist of the usual suspects (unencoded HTML, CSS, JS, graphics encoded in Base64, etc) concatenated in a way (with mime type headers) that could be worked out with a little effort. Having said that, the format is probably fully documented somewhere -- so dust off your browser, write some C# to parse it, and spoon-feed HtmlAgilityPack with the results.