5

Basically I am trying to parse an HTML string and extract some information using Cheerio.js.

My HTML is a follow (of course I reduced and simplified it):

<html>
    <head></head>
    <body>
        <div>
            <table>
                <tr>
                    <td>
                        <a href="/link_1.php">Link 1</a>
                    </td>
                    <td>
                        <a href="/link_2.php">Link 2</a>
                        <a href="/link_3.php">Link 3</a>
                    </td>
                    <td>
                        <a href="/link_4.php">Link 4</a>
                        <a href="/link_5.php">Link 5</a>
                    </td>
                </tr>
            </table>
        </div>
    </body>
</html>

My code is this one:

var cheerio = require("cheerio");
var $ = cheerio.load(html);
var page = $.root();

var tr = page.find("tr");

console.log(tr.find("> :nth-child(2) a").length);

You can try it here.

What I would expect is the code to return 2 because there is two links in the second direct child of the tr element. However, this returns 5, all the links which are in the tr are returned.

I tried the same thing with jQuery and the result is as it should be, see.

I also noticed that removing <html> tag makes it work correctly, but I do not know why.

Am I doing something wrong or should I report this to developers as a bug?

Edit: I just opened an issue on GitHub.

miken32
  • 42,008
  • 16
  • 111
  • 154
Delgan
  • 18,571
  • 11
  • 90
  • 141
  • 1
    You might wanna include this in your bug report: https://stackoverflow.com/questions/6481612/queryselector-search-immediate-children – Andreas Louv Nov 15 '15 at 00:01
  • Update from the future: this behavior appears fixed in `1.0.0-rc.12`, if not many versions before. – ggorlen Sep 01 '23 at 18:58

1 Answers1

0

That fixes your issue, it helps if you find the items by children opposed to just a general find() statement!

var $ = cheerio.load(html);
var page = $.root();

var tr = page.find("tr");

console.log(tr.children('td:nth-child(2)').children("a").length)
or
console.log(tr.find("> :nth-child(2)").find('a').length)
RichG
  • 1
  • 1
  • Thank you but I really want to know why my code is not working rather than using a workaroud. – Delgan Nov 15 '15 at 00:22
  • 1
    I think it is just an error in the way you are selecting because if you do this it works as well: console.log(tr.find("> :nth-child(2)").find('a').length) – RichG Nov 15 '15 at 00:26
  • Oh, this is interesting, thank you for having pointed this out. But what is wrong with the way I select elements? Why is it working if I remove the `` tag? – Delgan Nov 15 '15 at 00:38
  • I just figured out that `tr.find(" > :nth-child(2) > a")` is actually working too. But as I simplified my HTML code this should not matter because `` are not direct child in fact. – Delgan Nov 15 '15 at 00:40
  • Well what you're querying is find the Second child of the table row then find all the children of that child that are "a"s, you're walking down the dom, my belief is that earlier you're code was walking down the DOM then just selecting all elements that were A's not specific to the child it had found previously :) – RichG Nov 15 '15 at 00:46