Find a line (and column) of xml_node in rapidxml

Question

From what I could understand in the docs I deducted every xml_node knows it's position in the source text. What I'd like to do is to retrieve LINE and COLUMN for given xml_node<>*:

    rapidxml::file<> xmlFile("generators.xml"); // Open file, default template is char

    xml_document<> doc;               // character type defaults to char
    doc.parse<0>(xmlFile.data());;    // 0 means default parse flags
    xml_node<> *main = doc.first_node();  //Get the main node that contains everything
    cout << "My first node is: <" << main->name() << ">\n";
    cout << "   located at line " << main->?????() << ", column " << main->?????() << "\n";

How should I retrieve those offsets? Could I somehow crawl from the main->name() pointer back to the beginning of the document? But how can I access the document string from xml_document<> doc to compare offsets?

score 0 · Answer 1 · answered Dec 11 '14 at 10:50

0

Let's say you parse a simple xml document in a string.

char xml[] = "<hello/><world/>"

doc.parse(xml);

RapidXML will insert null terminators (and maybe make other mods to the "document", so it might look like this now:

char xml[] = "<hello\000\000<world\000\000";

If you than ask for the name() of the 'hello' node, it returns a pointer to the 'h' in your xml array. You can just subtract the base of the array to get an offset.

int offset = node->name() - &xml[0];

Obviously this isn't line and character. To get that, you'd need to count the number of newlines between the offset and the array start. (but maybe do this on a 'clean' version of the xml data, as RapidXML might well mangle newline sequences in the processed version..

answered Dec 11 '14 at 10:50

Roddy

66,617
42
165
277

In your example above, it would be `xmlFile.data()`. Does that make sense? I guess you're using `rapidxml_utils.hpp` for `rapidxml::file` – Roddy Dec 11 '14 at 14:19
Yep. But I really wanted to get the data out of the *node*. I want this so that I can easily return errors containing line and column where XML has incorrect data. If it's not possible to get this out of the node, I'd need to pass the original string all along which is a little bit annoying. – Tomáš Zato Dec 11 '14 at 14:28
@TomášZato You could locate the top of the document by iterating up the `node->parent()`s. Then use the `name()` method to get the address of the start of it's name. (Beware of whitespace being trimmed or compacted, when counting newlines - the "non-destructive" mode might help you) – Roddy Dec 11 '14 at 15:40

Find a line (and column) of xml_node in rapidxml

1 Answers1