I have to extract HTML between two elements. e.g.
<html>
<head> </head>
<body>
<div>
<span id="start">
<span> Some text </span>
<span> some other text</span>
</span>
<span id="parent">
<span id="target"> target node </span>
<span> some other text</span>
</span>
</div>
</body>
</html>
Now I want to extract HTML content starting from span with id "start" to span with id "target".
result:
<span id="start">
<span> Some text </span>
<span> some other text</span>
</span>
<span>
<span id="target"> target node </span>
I was able to extract the HTML using :
I am using tree parsing method.
htmlDocPtr xhtmlDoc = htmlReadFile(fileName.c_str(), "UTF-8", HTML_PARSE_RECOVER|HTML_PARSE_NOERROR|HTML_PARSE_NOWARNING);
htmlNodePtr rootNodePtr = xmlDocGetRootElement(xhtmlDoc);
Then I parsed to the required node and then I used:
xmlBufferPtr nodeBuffer = xmlBufferCreate();
xmlNodeDump(nodeBuffer, xhtmlDoc, cur_node, 0, 1);
printf("%s\n",nodeBuffer->content);
Note: cur_node is of type xmlNode *
But the problem is when I reach the span with id "parent" and extracts the data it given whole HTML content and I get:
<span id="start">
<span> Some text </span>
<span> some other text</span>
</span>
<span id="parent">
<span id="target"> target node </span>
<span> some other text</span>
</span>
means extra content. How I can achieve the intended result?