Basically I am trying to parse an HTML string and extract some information using Cheerio.js.
My HTML is a follow (of course I reduced and simplified it):
<html>
<head></head>
<body>
<div>
<table>
<tr>
<td>
<a href="/link_1.php">Link 1</a>
</td>
<td>
<a href="/link_2.php">Link 2</a>
<a href="/link_3.php">Link 3</a>
</td>
<td>
<a href="/link_4.php">Link 4</a>
<a href="/link_5.php">Link 5</a>
</td>
</tr>
</table>
</div>
</body>
</html>
My code is this one:
var cheerio = require("cheerio");
var $ = cheerio.load(html);
var page = $.root();
var tr = page.find("tr");
console.log(tr.find("> :nth-child(2) a").length);
You can try it here.
What I would expect is the code to return 2
because there is two links in the second direct child of the tr
element. However, this returns 5
, all the links which are in the tr
are returned.
I tried the same thing with jQuery and the result is as it should be, see.
I also noticed that removing <html>
tag makes it work correctly, but I do not know why.
Am I doing something wrong or should I report this to developers as a bug?
Edit: I just opened an issue on GitHub.