0

i want to pass the link returned in the response.css to items and then use this value to download my file. But everytime that I yield the dict I recive the following error: ValueError: Missing scheme in request url: h


My code is:


from scrapy.http import FormRequest
from diarioOficial.items import DiariooficialItem

class Crawl_Diario(scrapy.Spider):
    TERRITORY_ID = '12'
    name = 'do_acre'
    allowed_domains = ['www.diario.ac.gov.br/']

    start_urls = [
        'http://www.diario.ac.gov.br/'
    ]
def parse(self, response):
        yield DiariooficialItem(
            file_urls = response.css("div.edhoje a").attrib['href']
        )

How can I fix that error?

My items.py is:



class DiariooficialItem(scrapy.Item):
    file_urls = scrapy.Field()```
  • Does this answer your question? [Missing scheme in request URL](https://stackoverflow.com/questions/21103533/missing-scheme-in-request-url) – Nour Feb 22 '21 at 18:19
  • Unfortunately not. In this case, he just forgotten to use start_urls as a list. In my case, I believe middlewares is using all dict value to make the request, returning that error – Lucas Silluzio Feb 22 '21 at 18:22
  • On a side note - not sure how it could produce the error you're seeing, but I'd 1) drop the trailing slash from your allowed domains and 2) change your selector or be `response.css("div.edhoje a::attr(href)").getall()`. – Jon Clements Feb 22 '21 at 18:33
  • @JonClements adding ```.getall()``` fix the problem. I believe it expect a list – Lucas Silluzio Feb 22 '21 at 18:59
  • Use `file_urls = [response.urljoin(response.css("div.edhoje a").attrib['href'])]` – Gallaecio Feb 23 '21 at 05:23

0 Answers0