my spider just picks up some sites and gives an error

Question

i'm trying create a spider for scrawl a futbol site named futbin but i'm only can 10 sites, after 10 sites the error 429 aparrecing. P.S: i'm new to scrapy.

import scrapy


class FutbinSpider(scrapy.Spider):
    name = 'futbin'

    start_urls = [f'https://www.futbin.com/21/players?page={i}&sort=TotalStats&order=desc' for i in range(1, 21)]
    with open('items.json', 'w') as file:
        pass

    def parse(self, response):
        for indice_player in range(1, 31):
            yield {
                'name': response.xpath(f'//*[@id="repTb"]/tbody/tr[{indice_player}]/td[1]/div[2]/div[1]/a/text()').get(),
                'team': response.xpath(f'//*[@id="repTb"]/tbody/tr[{indice_player}]/td[1]/div[2]/div[2]/span/a[1]/@data-original-title').get(),
                'country': response.xpath(f'//*[@id="repTb"]/tbody/tr[{indice_player}]/td[1]/div[2]/div[2]/span/a[2]/@data-original-title').get(),
                'position': response.xpath(f'//*[@id="repTb"]/tbody/tr[{indice_player}]/td[3]/text()').get(),
                'base stats': response.xpath(f'//*[@id="repTb"]/tbody/tr[{indice_player}]/td[17]/text()').get()
            }```

score 0 · Answer 1 · answered Oct 29 '21 at 19:52

0

HTTP 429 Too Many Requests response status code indicates the user has sent too many requests in a given amount of time ("rate limiting"). Maybe the website tries to protect from web scrapers :-D

If you can allow waiting, then you can wait and try again. Maybe something like this How to handle a 429 Too Many Requests response in Scrapy?

answered Oct 29 '21 at 19:52

David Miró

2,694
20
20

thank you, I got it using middleware, I'm every 4/5 seconds making a request – Lenner Coutinho Pereira Oct 31 '21 at 19:02

my spider just picks up some sites and gives an error

1 Answers1