I'm working on a project to scrape statistics from Fantasy Football leagues across various services, and Yahoo is the one I'm stuck at currently. I want my spider to crawl the Draft Results page of a public Yahoo league. When I run the spider, it gives me no results, and no error message either. It simply says:
2012-09-14 17:29:08-0700 [draft] DEBUG: Crawled (200) <GET http://football.fantasysports.yahoo.com/f1/753697/draftresults?drafttab=round> (referer: None)
2012-09-14 17:29:08-0700 [draft] INFO: Closing spider (finished)
2012-09-14 17:29:08-0700 [draft] INFO: Dumping spider stats:
{'downloader/request_bytes': 250,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 48785,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2012, 9, 15, 0, 29, 8, 734000),
'scheduler/memory_enqueued': 1,
'start_time': datetime.datetime(2012, 9, 15, 0, 29, 7, 718000)}
2012-09-14 17:29:08-0700 [draft] INFO: Spider closed (finished)
2012-09-14 17:29:08-0700 [scrapy] INFO: Dumping global stats:
{}
It's not a login issue, because the page in question is accessible without being signed in. I see from other questions posted here that people have gotten scrapes to work for other parts of Yahoo. Is it possible that Yahoo Fantasy is blocking spiders? I've successfully written one for ESPN already, so I don't think the issue is with my code. Here it is anyway:
class DraftSpider(CrawlSpider):
name = "draft"
#psycopg stuff here
rows = ["753697"]
allowed_domains = ["football.fantasysports.yahoo.com"]
start_urls = []
for row in rows:
start_urls.append("http://football.fantasysports.yahoo.com/f1/" + "%s" % (row) + "/draftresults?drafttab=round")
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select("/html/body/div/div/div/div/div/div/div/table/tr")
items = []
for site in sites:
item = DraftItem()
item['pick_number'] = site.select("td[@class='first']/text()").extract()
item['pick_player'] = site.select("td[@class='player']/a/text()").extract()
item['pick_nflteam'] = site.select("td[@class='player']/span/text()").extract()
item['pick_ffteam'] = site.select("td[@class='last']/@title").extract()
items.append(item)
return items
Would really appreciate any insight on this.