1

I'm creating a scraper through ScraperWiki using Python, but I'm having an issue with the results I get. I'm basing my code off the basic example on ScraperWiki's docs and everything seems very similar, so I'm not sure where my issue is. For my results, I get the first documents title/URL that is on the page, but there seems to be a problem with the loop, as it does not return the remaining documents after that one. Any advice is appreciated!

import scraperwiki
import requests
import lxml.html

html = requests.get("http://www.store.com/us/a/productDetail/a/910271.htm").content
dom = lxml.html.fromstring(html)

for entry in dom.cssselect('.downloads'):
    document = {
        'title': entry.cssselect('a')[0].text_content(),
        'url': entry.cssselect('a')[0].get('href')
    }
    print document
user994585
  • 661
  • 3
  • 13
  • 28

1 Answers1

1

You need to iterate over the a tags inside the div with class downloads:

for entry in dom.cssselect('.downloads a'):
    document = {
        'title': entry.text_content(),
        'url': entry.get('href')
    }
    print document

Prints:

{'url': '/webassets/kpna/catalog/pdf/en/1012741_4.pdf', 'title': 'Rough In/Spec Sheet'}
{'url': '/webassets/kpna/catalog/pdf/en/1012741_2.pdf', 'title': 'Installation and Care Guide with Service Parts'}
{'url': '/webassets/kpna/catalog/pdf/en/1204921_2.pdf', 'title': 'Installation and Care Guide without Service Parts'}
{'url': '/webassets/kpna/catalog/pdf/en/1011610_2.pdf', 'title': 'Installation Guide without Service Parts'}
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195