-3

I have been looking in the source code of howdoi. https://github.com/gleitz/howdoi

In here the extract_links_from_bing and extract_links_from_google had these kind of syntax.

I tried to search online everything related to xml, element trees but the contructor like syntax is not found anywhere.

Here is the function

def _extract_links_from_bing(html):
    html.remove_namespaces()
    return [a.attrib['href'] for a in html('.b_algo')('h2')('a')]


def _extract_links_from_google(html):
    return [a.attrib['href'] for a in html('.l')] or \
    [a.attrib['href'] for a in html('.r')('a')]

My question is how does html('.b_algo')('h2')('a') iterate. Any links related to similar syntax will be appreciated.

Thanks for reading.

DYZ
  • 55,249
  • 10
  • 64
  • 93
RS156
  • 5
  • 2

1 Answers1

0

That project is using PyQuery, not xml etree.

Note that html is coming from _get_links():

def _get_links(query):
    search_engine = os.getenv('HOWDOI_SEARCH_ENGINE', 'google')
    search_url = _get_search_url(search_engine)

    result = _get_result(search_url.format(URL, url_quote(query)))
    html = pq(result)
    return _extract_links(html, search_engine)

and pq comes from here:

from pyquery import PyQuery as pq

A PyQuery object can be used like $ from jquery. This is the function call syntax you're referring to.

From their Quickstart:

>>> d("#hello")
[<p#hello.hello>]
>>> p = d("#hello")
>>> print(p.html())
Hello world !
>>> p.html("you know <a href='http://python.org/'>Python</a> rocks")
[<p#hello.hello>]
>>> print(p.html())
you know <a href="http://python.org/">Python</a> rocks
>>> print(p.text())
you know Python rocks
Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328