Questions tagged [scrapy-shell]

The Scrapy shell is an interactive shell where you can try and debug your scraping code very quickly, without having to run the spider.

It’s meant to be used for testing data extraction code, but you can actually use it for testing any kind of code as it is also a regular Python shell.

177 questions
1
vote
1 answer

Scrapy is not able to download the images from a URL

I am using scrapy to download the images but it is not working. I get the URL in desired folder but not the images. Here is my items.py: class Brand(scrapy.Item): name = scrapy.Field() url = scrapy.Field() brand_image = scrapy.Field() …
shahrukh ijaz
  • 117
  • 3
  • 10
1
vote
0 answers

Error while trying to Scrape JS pages with Scrapy and Splash

However i keep getting this issue in the shell. 2018-09-13 14:50:36 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2018-09-13 14:50:36 [scrapy.extensions.telnet] DEBUG: Telnet console…
yajant b
  • 396
  • 1
  • 4
  • 12
1
vote
2 answers

Scrapy Xpath: Extracting @title from img node

I want to extract the @title from the Main Notes According to Your Votes section from this page: https://www.fragrantica.com/perfume/Remy-Latour/Cigar-9351.html I have fetched the HTML, then tried this line of code on scrapy shell but the output was…
1
vote
1 answer

Data missing while scraping website

I am trying to scrap a website (Please refer to urls in the code). From the website ,i am trying to scrap all the information and transfer the data to json file. scrapy shell http://www.narakkalkuries.com/intimation.html To extract the information…
Amith
  • 73
  • 1
  • 10
1
vote
1 answer

How to scrape data from a site if there are some sort of loop of links opening down the page?

Here is the link. When you click on the first link("Accessories and Fluids"), a new table opens on the same page containing other links and clicking on other links,you'll interact with a table. The problem is that the first link have the same xpath…
1
vote
1 answer

Invoke scrapy's custom exporter by command line

While trying to resolve my problem (output an ordered Json array by a specific item's field), I've received an answer that suggests me to create a custom exporter for the job. I'm creating one, but... all the examples that I've find suggest to call…
Lore
  • 1,286
  • 1
  • 22
  • 57
1
vote
1 answer

How to fetch data using scrapy?

I am working on a Django project and I want to provide some news feeds to the home page. I recently got interact with scrapy, when I run given code with "scarpy shell", this code is able to fetch the data successfully. But when I put this code into…
jax
  • 3,927
  • 7
  • 41
  • 70
1
vote
2 answers

Scraping Value after Euro Symbol (Scrapy-Python)

i need the a selector to scrape the value after the euro symbol (\u20ac). I tried dozens of variations that i have found here on stackoverflow and…
Michael
  • 247
  • 1
  • 3
  • 10
1
vote
3 answers

Import Error:DLL failed when using scrapy in command prompt

I am getting the below issue when trying to create a folder using the scrapy command. I tried searching for this issue and found a solution at https://groups.google.com/forum/#!topic/scrapy-users/8N6V_OGUqtI I tried the steps provided there and…
Sandeep
  • 11
  • 3
1
vote
1 answer

Scrapy returning a empty json file

I am trying to get data from a website, everything seems to be correct and the xpath was tested on the shell. # -*- coding: utf-8 -*- from scrapy.contrib.spiders import CrawlSpider class KabumspiderSpider(CrawlSpider): name = "kabumspider" …
1
vote
1 answer

Why does my basic scrapy request get no response?

I am new to scrapy and trying to submit a form and scrape the response from https://www.fbo.gov/index?s=opportunity&tab=search&mode=list. When I use the scrapy shell: scrapy shell "https://www.fbo.gov/index?s=opportunity&tab=search&mode=list" it…
Ryan Gedwill
  • 65
  • 1
  • 2
  • 9
1
vote
0 answers

Scrapy's "pause/resume" became "pause/restart"

Here's the thing. Here is a large word list. I want to crawl some data according to these words. It's time-consuming so I'd like to split it into pieces. First, I load a list of words into a list in __init__ of my spider. def __init__(self,…
Pacific_73
  • 11
  • 2
1
vote
1 answer

Robots.txt and Allow?

So I'm new to web crawling and I'm having trouble understanding a particular robots.txt file. In this case, this is what the website has: User-agent: * Allow: / Sitemap: sitemapURLHere So I looked up the / here and found it was for any path. So…
ocean800
  • 3,489
  • 13
  • 41
  • 73
1
vote
1 answer

How to get the data for each ad in this page?

I am scraping this page to get data of each Ad: http://www.cars2buy.co.uk/business-car-leasing/Abarth/695C/? Here is my code in scrapy shell: scrapy shell "http://www.cars2buy.co.uk/business-car-leasing/Abarth/695C/" for content in…
Hat hout
  • 471
  • 1
  • 9
  • 18
1
vote
1 answer

web-crawling - get item-title from bandcamp.com

I try to get the item-title from new releases at bandcamp.com from the 'Discover' part of the page (rock->all rock->new arrivals) scrapy shell 'https://bandcamp.com/?g=rock&s=new&p=0&gn=0&f=all&w=0' Part of the relevant source code of the page…
fuser60596
  • 1,087
  • 1
  • 12
  • 26