0

I think I'm missing something trivial but I'm losing a bunch of time on this, so its solution may be useful to others too:

I'm working with libxml2 2.9.8 (pure C, not C++ bindings) under linux. I have an external (non-libxml) tree structure representing an XML file and I'm trying to write into a string representation using libxml2. All is trivial and working nice traversing it and writing using xmlTextWriter API (it is a struct with simple attributes, like

 typedef struct _simplifiedNode {
    char *tag,
    char *content,
    struct _simplifiedNode *parent,
    struct _simplifiedNodeList *children,
 } simplifiedNode;

), except at a certain point I encounter a string node that may contain the string representation of an xml document. I can parse it using the xmlReadMemory API, but then I need to nest it (and not its escaped string representation) into the on-going writer, including namespaces and attributes.

Is there a trivial way I am missing to do this recursively having the parsed doc/root element, without introspecting every sub-element?

e.g.

I'm producing the following document using xmlTextWriter API

<Title>
    TitleValue
</Title>
<Date>
    2018-11-26
</Date>
<Content>

The Content node in the non-libxml tree is a leaf node with tag Content containing a string like

char *content = "<SomeXmlComplexDocument ss:someattr=\"attrval\">Somecontent</SomeXmlComplexDocument>"

What I Want to achieve is, instead of having something like

<Content>&lt;SomeXmlComplexDocument&gt; ... </Content>

after having parsed and validated the content with xmlReadMemory to re-inject the document obtaining

<Content>
    <SomeXmlComplexDocument ss:someattr="attrval">Somecontent</SomeXmlComplexDocument>
</Content>

namespaces and attributes should be preserved.

nwellnhof
  • 32,319
  • 7
  • 89
  • 113
Zoten
  • 3
  • 5
  • Please give some sample xml - including the node which contains the string representation of an xml document. – iVoid Nov 26 '18 at 11:52
  • Thank you, hope it is more clear now! – Zoten Nov 26 '18 at 13:26
  • 1
    You can output the inner XML unescaped with [xmlTextWriterWriteRaw](http://xmlsoft.org/html/libxml-xmlwriter.html#xmlTextWriterWriteRaw), but if you want validation, you'll have to parse it somehow. – nwellnhof Nov 26 '18 at 20:39
  • @nwellnhof since I feel like yours is the only way to go with libxml2 API (and that's how I implemented it right now), I'd like to accept yours as an answer, if you'd like to write it as one! – Zoten Dec 03 '18 at 08:33

1 Answers1

0

To serialize the inner XML fragments unescaped, you can simply use xmlTextWriterWriteRaw. This won't check whether the XML is well-formed, though. If you need validation, you'll have to parse the XML fragments at some point. Depending on the content model, you might have to use xmlParseBalancedChunkMemory instead of xmlReadMemory. It should also be possible to parse the result document in one go after it was written, but you'll lose information like original line numbers.

nwellnhof
  • 32,319
  • 7
  • 89
  • 113