1

XML's can be transformed by a browser using XSL. This can be either done by referencing an xsl within the xml by adding the following line within the xml file:

<?xml-stylesheet type='text/xsl' href='sample.xsl'?>

Opening this xml file now with internet explorer will display the correct data in the browser.

The XML itself references many other files as e.g. pictures which are located in some folder.

I want to save the data displayed (with all the referenced data) in one single mhtml file (*.mht)

How do I proceed? And is this possible?

Note: Files are all local (not on a server) and the initial xml is a result of test data. I just want my XML file to be displayed correctly as before started from a single file without accessing other data outside the mhtml file.

Edit upon answer 1:

I have included my xml within an iFrame of an HTML:

<body><iframe src="name.xml" width = "100%" height="1000"> </iframe></body>

This I have saved using IE into an *.mht file:

From: <Saved by Windows Internet Explorer 7>
Subject: XML-Test
Date: Wed, 22 Feb 2012 14:47:34 +0100
MIME-Version: 1.0
Content-Type: multipart/related;
    boundary="----=_NextPart_000_0000_01CCF170.E99B1DF0"
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157

This is a multi-part message in MIME format.

------=_NextPart_000_0000_01CCF170.E99B1DF0
Content-Type: text/html;
    charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
Content-Location: file://C:\Documents and Settings\STEFFAN\Desktop\Test\XML-Test.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>XML-Test</TITLE>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Dwindows-1252">
<META content=3D"MSHTML 6.00.6000.17107" name=3DGENERATOR></HEAD>
<BODY><IFRAME=20
src=3D"http://www.m.de/name.xml"=20
width=3D"100%" height=3D1000>
</IFRAME></BODY></HTML>

------=_NextPart_000_0000_01CCF170.E99B1DF0
Content-Type: text/xml;
    charset="unicode"
Content-Transfer-Encoding: base64
Content-Location: http://www.m.de/name.xml

//48ACEARABPAEMAVABZAFAARQAgAEgAVABNAEwAIABQAFUAQgBMAEkAQwAgACIALQAvAC8AVwAz
AEMALwAvAEQAVABEACAASABUAE0ATAAgADQALgAwACAAVAByAGEAbgBzAGkAdABpAG8AbgBhAGwA...

Since I use local files originally instead of "http://www.m.de/name.xml" "file://C:\Documents and Settings\STEFFAN\Desktop\Test\SUPL_TCLog.xml" was outputted by IE.

But this local reference does not seem to work for mhtml, which sucks and therefore, I used a random substitute (http://www.m.de/name.xml) for it. (this works fine for image files). Changing it accordingly opening the mht file will start a download of the xml file. But this is not wanted. I want it to be displayed.

What is missing?

Vladimir S.
  • 450
  • 2
  • 10
  • 23

2 Answers2

0

This is possible, but you will have to build some things yourself.

MHTML is essentially a multipart email message. Its format is fully described by RFC-2557. It can be generated by email message generators and serializers.

However I am unaware of any tool that will generate MHTML programmatically. Be warned, too, that there is no one standard web archive format (there are at least four), and only IE, Opera, and Chrome can read MHTML.

The simplest thing that can work is to script IE to open your page and save it as MHTML.

If you want to generate MHTML without IE, then you need to create an MHTML archiver.

With an archiver, the simplest thing to do is to:

  1. include all possible external resources
  2. make sure all those resources are always referenced by the same url
  3. then use a matching content-location for that resource.

This way you do not need to rewrite href and src attributes or parse your xsl or html to discover what resources to include.

If there are too many possible external resources or you can't use paths consistently, you will need to do parsing for resource discovery and/or url rewriting.

In any scenario you can either save the xml+xsl, or you can generate the HTML output first and save that.

Community
  • 1
  • 1
Francis Avila
  • 31,233
  • 6
  • 58
  • 96
  • Thanks for your answer. Here some questions on it: I have tried opening my page with Internet Explorer, but saving will result in saving an xml. (mht is not choosable). Therefore, I have tried embedding it in an IFrame and saving it, also without success. The MHTML cannot be opened as soon as I delete the resource. (see code above) Concerning referencing of resources "with the same url" do you mean each source should be referenced by a different url, correct? Since I work with local files url by file/// do not seem to be working. Is it like this? – Vladimir S. Feb 23 '12 at 08:35
  • It looks like IE will only offer that option if the open file is an HTML file, so do the transform to HTML first and then open that in IE. "Same url" means e.g. for the same file "a.png", don't reference by "a.png" and "./a.png" because the `content-location` cannot not match both. – Francis Avila Feb 23 '12 at 10:32
-1

I did try some attempts, but was not successful. Also Francis Avila's proposal I could not solve successful.

Especially code referenced by javascript also contained other references. I did not know how to resolve these and put them into mhtml.

Maybe using Altova StyleVision would be a solution.

Since I did not longer try. I close this thread.

Vladimir S.
  • 450
  • 2
  • 10
  • 23