Highest Voted 'scrapinghub' Questions

0

votes

1 answer

Getting spider on Scrapy Cloud to store files on Google Cloud Storage using GCSFilesStore and getting ImportError

Deploying a spider to Scraping Cloud. It gathers download links for files and should save those files in a Google Cloud bucket. It works when running locally. But when deploying to Scraping Hub it return the following errors: Traceback (most recent…

asked Apr 19 '19 at 22:57

markkazanski

439
7
20

0

votes

1 answer

How to scrape data on website if using Javascript with pagination

I have a website that's need to scrape the data "https://www.forever21.com/us/shop/catalog/category/f21/sale#pageno=1&pageSize=120&filter=price:0,250&sort=5" but I cannot retrieve all the data it also has pagination and Its uses javascript as well.…

python web-scraping scrapy splash-screen scrapinghub

asked Apr 16 '19 at 08:15

Christian Read

135
11

0

votes

2 answers

ScrapingHub: No Module named Mysql.connector

In my local machine everything works fine, but when I deployed it on ScrapingHub I've got an error saying all "ImportError: No module named mysql.connector". All I need is to, whenever I run my spider or run through job schedule it will…

python mysql scrapy scrapinghub

asked Apr 15 '19 at 09:57

Christian Read

135
11

0

votes

1 answer

Scrapinghub exporting multiple items

In scrapinghub how can we achieve multiple items exporting? I have MainItem() and a SubItem() item classes and I would like to get two separate items in scrapinghub item's page. I can do this by implementing different item pipelines for both …

python web-scraping scrapy scrapinghub

asked Apr 05 '19 at 22:10

Jithin

1,692
17
25

0

votes

2 answers

Distributed communication between Scrapy spiders

I want to run two spiders in a coordinated fashion. First spider will scrape some website and produce URLs and the second one will consume these addresses. I can't wait for the first spider to finish and then launch the second one since the website…

web-scraping scrapy scrapinghub

asked Mar 14 '19 at 09:24

Bociek

1,195
2
13
28

0

votes

0 answers

How to "add" geckodriver to PATH on ScrapingHub?

I am using python2 for web scraping, I have written a spider that uses headless Firefox (no GUI) to go on a website, log in with my account and furthermore interact with the website by pressing buttons, filling forms, calendars, etc. It works as…

python web-scraping geckodriver scrapinghub

asked Jan 21 '19 at 17:46

Luis Viguria

1
1

0

votes

0 answers

Retrieve all items from Scrapinghub as hash

I retrieved all items from a job in Scrapinghub: url = "https://storage.scrapinghub.com/items/#{job_id}?apikey=#{API_KEY}" response = HTTParty.get(url) items = response.parsed_response The problem is that items is a String instead of a Hash. Is…

ruby scrapinghub

asked Dec 02 '18 at 18:37

abc03

13
3

0

votes

1 answer

Adding meta deltafetch_key for every request in SitemapSpider and CrawlSpider

I'm using scrapinghub's deltafetch feature in order to get new pages from a website, without requesting the urls I have already scraped. I've noticed that on some websites, scrapy would still scrap pages with an already visited url. I had to replace…

python scrapy scrapinghub

asked Nov 05 '18 at 09:15

romain-lavoix

403
2
6
20

0

votes

1 answer

How can I transform a value after it was extracted?

I am using Portia to extract info from a page. However, one of the values extracted is not in a format that I can use. More specifically, I want to extract a numeric value which uses a dot instead of a comma to denote thousands e.g. "1.000" instead…

scrapy scrapinghub portia

asked Aug 30 '18 at 09:57

George Eracleous

4,278
6
41
50

0

votes

2 answers

From local scrapy to scrapy cloud (scraping hub) - Unexpected results

The scraper I deployed on Scrapy cloud is producing an unexpected result compared to the local version. My local version can easily extract every field of a product item (from an online retailer) but on the scrapy cloud, the field "ingredients"…

regex python-3.x scrapy scrapinghub

asked Jul 29 '18 at 08:23

BoobaGump

525
1
6
17

0

votes

0 answers

Scrapy 0 pages crawled but no visible issue?

I used Portia to create a spider and then downloaded it as scrapy project. The spider runs fine but it says in the logs: Scrapy Crawled 0 pages (at 0 pages/min) and also nothing get's saved. However, it also shows all the pages crawled with 200…

web-scraping scrapy scrapinghub portia

asked Jul 24 '18 at 04:32

SimpleCoder

67
7

0

votes

1 answer

What does Scrapy Job Setting mean?

I was reading https://doc.scrapinghub.com/scrapy-cloud.html#scrapycloud , and kind of confused what does it mean to override a Scrapy settings for a job. Does it mean that I can change the start_url? Or which setting that I can really override.…

python web-scraping scrapy scrapinghub

asked Apr 26 '18 at 20:03

Andre Rumapea

3
5

0

votes

0 answers

"Scrapy crawl " not working from the project folder running from spiders folder

I am newbie in Python. I have tried finding the solution everywhere but couldn't get through. I have made a Scrapy project and because of the project structure, the spiders are by default stored in /spiders directory. Problem: We generally run the…

python scrapy google-cloud-platform scrapyd scrapinghub

asked Apr 11 '18 at 04:50

Sankalp Nigam

37
1
1
7

0

votes

1 answer

How can deltafetch & splash be used together in Scrapy (python)

I am trying to build a scraper using scrapy and I plan to use deltafetch to enable incremental refresh but I need to parse javascript based pages which is why I need to use splash as well. In the settings.py file, we need to add SPIDER_MIDDLEWARES =…

scrapy scrapy-splash scrapinghub

asked Mar 15 '18 at 12:26

Aayush Agrawal

184
1
6

0

votes

2 answers

Automatically Parse a Website

I have an idea and want to see whether it is possible to implement. I want to parse a website (copart.com) that shows, daily, a different and large list of cars with the corresponding description for each car. Daily, I am tasked with going over each…

web-scraping scrapy screen-scraping web-site-project scrapinghub

asked Feb 02 '18 at 02:53

Geek96

57
8

Questions tagged [scrapinghub]