1

I parse a document and want to retrieve part of the XML tree as string. The document (example):

<?xml version="1.0"?>
<MyConfig>
    <MyData>
        <Foo bar="baz>42</Foo>
    </MyData>
    <OtherData>Something</OtherData>
</MyConfig>

The code:

  pugi::xml_document doc;
  doc.load_file(documentFileName);
  pugi::xml_node root = doc.child("MyConfig");

  // parse custom data
  _customData = root.child("MyData"). <-- HOW TO GET INNER XML?

Expected contents of custom data (if formatting is lost, I don't mind):

"<Foo bar="baz>42</Foo>"

How to do this?

Tomáš Zato
  • 50,171
  • 52
  • 268
  • 778

2 Answers2

4

I found a solution directly in the docs, it's just that google does not index them well, so I had to look it up manually. My solution was to use pugi::xml_writer and node.print. In the docs, they even already show an implementation for std::string:

struct xml_string_writer: pugi::xml_writer
{
    std::string result;

    virtual void write(const void* data, size_t size)
    {
        result.append(static_cast<const char*>(data), size);
    }
};

With that available, I just made a convenience function to merge XML of all child nodes:

std::string InnerXML(pugi::xml_node target)
{
  xml_string_writer writer;
  for (pugi::xml_node child = target.first_child(); child; child = child.next_sibling())
    child.print(writer, "");
  return writer.result;
}
Tomáš Zato
  • 50,171
  • 52
  • 268
  • 778
3

I think pugi::xml_node::print() is a way.

pugi::xml_node node = root.child("MyData");
pugi::xml_node child = node.first_child();

std::stringstream ss;
child.print(ss);
std::string s = ss.str();

The trouble is that s will now have value

<Foo bar="baz&gt;42&lt;/Foo&gt;     &lt;/MyData&gt;     &lt;OtherData&gt;Something&lt;/OtherData&gt; &gt; &#10;&lt;/MyConfig&gt;" />
  1. It's the textual tree from the node onwards, and;
  2. It's messy with html escape sequences rather than < and >

Not ideal, but these can obviously be solved with some string manipulation.

// replace &lt; with <
size_t off = 0;
while ((off = s.find("&lt;", off)) != s.npos)
  s.replace(off, 4, "<");

// replace &gt; with >
off = 0;
while ((off = s.find("&gt;", off)) != s.npos)
  s.replace(off, 4, ">");

// truncate at the closing tag
size_t end_open = s.find(">", 0);
size_t end_close = s.find(">", end_open + 1);
s = s.substr(0, end_close + 1);

Which will lead to s having value

<Foo bar="baz>42</Foo>
acraig5075
  • 10,588
  • 3
  • 31
  • 50