Questions tagged [scrapy-shell]

The Scrapy shell is an interactive shell where you can try and debug your scraping code very quickly, without having to run the spider.

It’s meant to be used for testing data extraction code, but you can actually use it for testing any kind of code as it is also a regular Python shell.

177 questions
0
votes
1 answer

scrapy xpath selector issue

I managed to find the attribute I want to isolate using the debugging spider, but i'm not sure if incorporated it into my spider correctly. I dont get an explicit error message when the spider runs, so i'm thinking I just entered the selector…
0
votes
1 answer

Scrapy redirects to homepage for some urls

I am new to Scrapy framework & currently using it to extract articles from multiple 'Health & Wellness' websites. For some of the requests, scrapy is redirecting to homepage(this behavior is not observed in browser). Below is an example: Command: …
Aditya
  • 13
  • 2
0
votes
1 answer

Why my scrapy does not used all urls in start_urls list?

I have almost 300 urls in my start_urls list, but the scrapy only scrawl about 200 urls. But not all of these listed urls. I do not know why? How I can deal with that. I have to scrawl more items from the website. Another question I do not…
mootvain
  • 39
  • 4
-1
votes
1 answer

scrapy parse function not getting called- Also bulk saving

I found similar questions have been asked already but none of the answers explained my situation. Ans I also need help implementing the second part of my code. here the code : import scrapy import json class MySpider(scrapy.Spider): name =…
Ktrel
  • 115
  • 1
  • 9
-1
votes
2 answers

Getting error when sending request to a website using Scrapy shell

I was learning Scrapy framework. I tried to use scrapy shell. There I was trying to fetch response from "https://quotes.toscrape.com/". The commands are below- python -m scrapy shell Inside the shell- >> from scrapy import Request >> req =…
-1
votes
1 answer

Links doesn't have url format in order to scrape them scrapy

This is my code: import scrapy from scrapy import Spider from scrapy.http import FormRequest class ProvinciaSpider(Spider): name = 'provincia' allowed_domains = ['aduanet.gob.pe'] start_urls =…
-1
votes
1 answer

Scrapy shell view(response) not working properly

When I open the Scrapy shell through the command scrapy shell "http://quotes.toscrape.com/" (this example comes from the Scrapy tutorial), I enter the command view(response) which opens my navigator (Firefox to be precise) with a path looking like…
Takamura
  • 347
  • 5
  • 12
-1
votes
1 answer

I can't extract the links with scrapy

i need help for extract the links in the page: https://www.remax.pt/comprar-empreendimentos?searchQueryState={%22page%22:1,%22sort%22:{%22fieldToSort%22:%22PublishDate%22,%22order%22:1}}
-1
votes
1 answer

Scrapy can't identify "tbody" and "ul" elements as listed by Firebug

I am trying to extract every title of this mailing list while registering how many replies each thread has. According to Firebug, the Xpath to the
    that contains all the titles…
-1
votes
1 answer

scrapy1.1 Crawled 0 pages but i can get data with scrapy shell command

I've been trying to study the Scrapy tutorial and after running the command at the project top level , I get the following output: 2016-07-05 21:06:01 [scrapy] INFO: Scrapy 1.1.0 started (bot: tutorial) 2016-07-05 21:06:01 [scrapy] INFO:…
-2
votes
1 answer

Scrapy: Parsing data for one variable directly from the start url and data for other variables after following all the href from the start url?

How can I go about parsing data for one variable directly from the start url and data for other variables after following all the href from the start url? The web page I want to scrape has a list of articles with the "category", "title", "content",…
hareko
  • 5
  • 2
-3
votes
1 answer

How to properly scrape 1 URL at a time with attached attribute?

I am looking to scrape multiple website domains for various href's within their careers pages. I only want the links to the jobs and nothing else, and the easiest way I have found to do that is to parse the scrapy response and pull the href's from a…
Tommy543
  • 1
  • 1
1 2 3
11
12