3

As per title. The application is user-written configuration files which might from time to time have to be updated in certain parts but otherwise should be unchanged. A starting point is just to be able to pass input to output unchanged.

I accept that the inputs <tag></tag> and <tag/> are pretty much equivalent and probably won't be distinguished on output but other than that, I'd like to preserve the XML as much as possible.

The first attempt was Text.XML.HaXml.SAX.saxParse but that suppresses whitespace after a comment so that for example:

<!-- next section: -->
<section>
    ...
</section>

is parsed as:

<!-- next section: --><section>
    ...
</section>

which is an unacceptable change. The next attempt was via HXT at http://pastebin.com/qNyVuBK7 and this works quite well except that entities in attribute data are munged; e.g.,

<view UID="&Label;" ifNotNull="&Term;">

becomes

<view UID="&amp;Label;" ifNotNull="&amp;Term;">

even though entities in normal textual data are passed correctly. Can anyone suggest how to fix that last problem, or another way to achieve the objective?

It seems that https://hackage.haskell.org/package/roundtrip-xml-0.2.0.0 might help but I can't find any documentation on how to use it.

  • 1
    You don't want to fix the last problem, the `&` symbol is reserved in XML and gets encoded as `&`. Any and every XML decoder will recognize that correctly. HXT is actually cleaning up your input for you, as your input wasn't valid XML to begin with. I would leave those the same. If you're worried about it being "human readable", then don't use XML as a configuration format, use YAML or JSON or INI or anything but XML, which is designed to be computer readable, not human readable. – bheklilr May 29 '15 at 13:13
  • I _do_ want to fix that last problem. The original file contained an entity within an attribute value which would be expanded to the value of the entity by the processor. The re-written file would have as the attribute value an ampersand followed by some letters followed by a semi-colon. – Michael restore Monica Cellio May 29 '15 at 20:58
  • That would make your XML file non-compliant, just saying. And again, if you were to load that back in with HXT or HaXml the text of that attribute would be exactly `"&Label;"`, you could still do your processing on it. However, it's being stored in the file as `"&Label;"` because `&` is a [reserved character for XML](http://www.w3resource.com/xml/reserved-markup-characters.php). It's like if you wanted an element like `<`, this wouldn't parse well because of the extra `<`, instead you would have in the XML file `<`. – bheklilr May 29 '15 at 21:17
  • What happens when someone tries to have a variable name (presuming substitution here) of `lt`, `gt`, `amp`, `apos`, `quot`? – bheklilr May 29 '15 at 21:17
  • Regrettably I still don't understand your point. If the input file contained for example `` then the output would be `` and I just don't understand how (1) the input is ill-formed/non-compliant and (2) the output is equivalent to the input or a corrected version of the input. – Michael restore Monica Cellio May 29 '15 at 23:22
  • The xml standard used several characters in its syntax that can lead to ambiguity when parsing. The standard then accounts for this by allowing them to be interpreted by special escape sequences, much as you might have `\n` in a haskell string to indicate a literal new line character. When one of these escape sequences is found, a compliant parser will instead return the actual character that sequence represents. Your desired substitution syntax overlaps with xml escape sequences, in that it uses the same syntax. Hxt tries to help out by correctly encoding those sequences. – bheklilr May 29 '15 at 23:52
  • If you tried to parse an attribute containing `'`, it would be read in as the character `'` (a single apostrophe) and NOT as the string `"'"`. – bheklilr May 29 '15 at 23:54
  • If I had an attribute containing `'` I would expect it to be read as `'`. That is exactly what i would want. Similarly, with `attr="&Field;"` I would expect `attr` to have the value `thing` taken from an `` declaration elsewhere in a DTD. That would be during 'normal' processing. But I am asking about what apparently is called round-trip processing, where the output should be equal to the input, except for explicit transformations performed by my code; i.e., entity substitutions should not be made. – Michael restore Monica Cellio May 30 '15 at 01:20
  • OK, I do now see what you're saying and I'll have to think further on how to achieve what I want. – Michael restore Monica Cellio May 30 '15 at 02:28

0 Answers0