1

Why is the output repeated when I parse a string using PyQuery in Spyder?

Here is my code:

from pyquery import PyQuery as pq
html = """

    <ul>
        <li>first-item</li>
        <li><a href="link2.html">second item</a></li>
        <li><a href="link3.html">third item</a></li>
        <li><a href="link4.html">fourth item</a></li>
        <li><a href="link5.html">fifth item</a></li>        
    </ul>

"""
doc = pq(html)
print(type(doc))
print(doc('li'))

Here is the output:

<class 'pyquery.pyquery.PyQuery'>
<a href="link2.html">second item</a></li>
        <li class="item=-0 active"><a href="link3.html"><span class="" bold="">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>        
    </ul>
</div>
</body></html><a href="link3.html"><span class="" bold="">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>        
    </ul>
</div>
</body></html><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>        
    </ul>
</div>
</body></html><a href="link5.html">fifth item</a></li>        
    </ul>
</div>
</body></html>

However, according to my textbook the output should be

<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>

I have tried very hard to find the answer to the problem on the Internet, but there is no similar problem on the forum or Github. I hope you can help me, I will be very grateful.

nekomatic
  • 5,988
  • 1
  • 20
  • 27
Young
  • 11
  • 3

1 Answers1

0

You don't search the right tag. You want to have all the <li> elements, so you should search for li, not for a

Thus, you would have :

from pyquery import PyQuery as pq
html = """
    <ul>
        <li>first-item</li>
        <li><a href="link2.html">second item</a></li>
        <li><a href="link3.html">third item</a></li>
        <li><a href="link4.html">fourth item</a></li>
        <li><a href="link5.html">fifth item</a></li>        
    </ul>
"""
doc = pq(html)
print(type(doc))
print(doc('li'))

This gives me :

<class 'pyquery.pyquery.PyQuery'>
<li>first-item</li>
<li><a href="link2.html">second item</a></li>
<li><a href="link3.html">third item</a></li>
<li><a href="link4.html">fourth item</a></li>
<li><a href="link5.html">fifth item</a></li> 

I tested independantly of any context, just with the snippet you gave. If there is still something going wrong when applying this, the error must come from elsewhere in your code.

imperosol
  • 584
  • 4
  • 14
  • Don't be sorry. You made an effort to properly format your code and give precise output. That's really good. – imperosol Apr 02 '22 at 13:47