0
br.open('http://www.google.com/advanced_search') 
br.select_form(name='f')   
br.form['as_q'] = "lxml"
data = br.submit()
html_string = data.read()   //this is my input
parser = etree.HTMLParser()
tree = etree.parse(StringIO(html_string), parser)
follow_urls = tree.xpath('//*[@id="nav"]/tbody/tr/td/a')

am using the above code to get the follow up links from the google search results.but it returns empty.

But when i do the same in console I get the links

enter image description here

What am doing wrong?

user
  • 141
  • 1
  • 10

1 Answers1

0

Do you really have table/tbody/tr in the HTML string?

tbody is usually inserted by your browser and you see it in your inspect window.

You can try:

tree.xpath('//*[@id="nav"]/tr/td/a')

or to cover both cases:

tree.xpath('(//*[@id="nav"]/tbody/tr | //*[@id="nav"]/tr)/td/a')

Example python shell session:

>>> import pprint
>>> import lxml.html
>>> root = lxml.html.parse('http://www.google.fr/search?q=lxml').getroot()
>>> pprint.pprint(root.xpath('(//*[@id="nav"]/tbody/tr | //*[@id="nav"]/tr)/td/a/@href'))
['/search?q=lxml&ie=UTF-8&prmd=ivns&ei=UHsxU-6KK8nxhQfinYG4Ag&start=10&sa=N',
 '/search?q=lxml&ie=UTF-8&prmd=ivns&ei=UHsxU-6KK8nxhQfinYG4Ag&start=20&sa=N',
 '/search?q=lxml&ie=UTF-8&prmd=ivns&ei=UHsxU-6KK8nxhQfinYG4Ag&start=30&sa=N',
 '/search?q=lxml&ie=UTF-8&prmd=ivns&ei=UHsxU-6KK8nxhQfinYG4Ag&start=40&sa=N',
 '/search?q=lxml&ie=UTF-8&prmd=ivns&ei=UHsxU-6KK8nxhQfinYG4Ag&start=50&sa=N',
 '/search?q=lxml&ie=UTF-8&prmd=ivns&ei=UHsxU-6KK8nxhQfinYG4Ag&start=60&sa=N',
 '/search?q=lxml&ie=UTF-8&prmd=ivns&ei=UHsxU-6KK8nxhQfinYG4Ag&start=70&sa=N',
 '/search?q=lxml&ie=UTF-8&prmd=ivns&ei=UHsxU-6KK8nxhQfinYG4Ag&start=80&sa=N',
 '/search?q=lxml&ie=UTF-8&prmd=ivns&ei=UHsxU-6KK8nxhQfinYG4Ag&start=90&sa=N',
 '/search?q=lxml&ie=UTF-8&prmd=ivns&ei=UHsxU-6KK8nxhQfinYG4Ag&start=10&sa=N']
>>> 
paul trmbrth
  • 20,518
  • 4
  • 53
  • 66