0

I'm new in Scrapy and I have some troubles to get a table data. I'm trying to store in a file the table with id = grdTableView_DXMainTable from: view-source:http://databank.worldbank.org/data/reports.aspx?source=2&series=SE.PRM.NENR&country=

I'm using the following code:

import scrapy

class mySpider(scrapy.Spider):
    name = "education"

    def start_requests(self):
        urls = [
          'http://databank.worldbank.org/data/reports.aspx?source=2&series=SE.PRM.NENR&country=',
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'education-%s.html' % page
        with open(filename, 'wb') as f:    
            f.write(hxs.select('//table[@class="grdTableView_DXMainTable"]/td.//text()').extract())      
            self.log('Saved file %s' % filename)

The resulting html file is empty. Can anyone help me?

stranac
  • 26,638
  • 5
  • 25
  • 30
  • Possible duplicate of [Scrapy - Extract items from table](https://stackoverflow.com/questions/42947417/scrapy-extract-items-from-table) – parik May 03 '18 at 13:42

1 Answers1

1

There are some points on your code that aren't correct:

1) You are using hxs.select which hasn't be defined in any part of your code.

2) The value grdTableView_DXMainTable is not the class name, is the ID. You can extract all the table info by using: response.xpath('//table[@id="grdTableView_DXMainTable"]//td//text()').extract()

3) If you want to keep all the HTML code it would be better to do this instead: response.xpath('//table[@id="grdTableView_DXMainTable"]').extract_first()

VMRuiz
  • 1,931
  • 12
  • 20