Highest Voted 'scrapinghub' Questions

1

vote

0 answers

scrapinghub requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://storage.scrapinghub.com

I am trying to run scrapy_price_monitor in local environment, but when I give the command "scrapy crawl spidername", it returns "unauthorized" when trying to send the item to storage.scrapinghub. I have already succesfully "shub login" (added my…

python scrapy scrapinghub

asked Feb 12 '20 at 13:52

pedrovgp

767
9
23

1

vote

0 answers

Scrapy: settings, multiple concurrent spiders, and middlewares

I'm used to running spiders one at a time, because we mostly work with scrapy crawl and on scrapinghub, but I know that one can run multiple spiders concurrently, and I have seen that middlewares often have a spider parameter in their…

scrapy scrapinghub

asked Dec 20 '19 at 11:09

kenshin

197
11

1

vote

0 answers

Why Splash headless browser can not able to fetch the page of Linkedin

I have tried to get the page source of Linkedin. But I cannot able to fetch even for one URL. I got the response like "Failed loading page" Few samples, https://www.linkedin.com/company/amazon https://www.linkedin.com/company/apple Splash version:…

headless-browser scrapy-splash scrapinghub splash-js-render

asked Nov 27 '19 at 07:17

Mideen abdul gaffoor

81
1
7

1

vote

1 answer

How to scrape multiple websites with different data in urls

I'm scraping some data from a webpage where at the end of the url has the id of the product, it appears to rewrite the data at every single row, like its not appending the data from the next line, I don't know exactly what's going on, if my first…

python python-2.7 web-scraping beautifulsoup scrapinghub

asked Nov 06 '19 at 07:17

Ivan Barba

11
1

1

vote

1 answer

mysql.connector.errors.InterfaceError: 2003: Can't connect to MySQL server on '127.0.0.1:3306' on Scrapinghub

I try to run my spider on scrapinghub, and run it getting an error Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks result = g.send(result) File…

python mysql scrapy scrapinghub

asked Jul 28 '19 at 15:15

Biddaris

33
1
4

1

vote

1 answer

"'str' object has no attribute 'get'" when using Google Cloud Storage with ScrapingHub

I'm trying to get Google Cloud Storage working with a Scrapy Cloud + Crawlera project so that I can save text files I'm trying to download. I'm encountering an error when I run my script that seems to have to do with my Google permissions not…

python google-cloud-platform scrapy google-cloud-storage scrapinghub

asked Jun 25 '19 at 02:54

Nathan Wailes

9,872
7
57
95

1

vote

1 answer

Scrapy throuws Exception "raise _DefGen_Return(val) twisted.internet.defer._DefGen_Return: "

When I run the code locally (windows 10) everything works fine. Have checked other answers here and other resources, but failed to figure out any solution. After deploying to ScrapingHub Im getting this error message: [scrapy.core.scraper] Spider…

python-3.x scrapy twisted twisted.internet scrapinghub

asked Apr 04 '19 at 11:46

Billy Jhon

1,035
15
30

1

vote

2 answers

How to dynamically upload data from Scrapinghub to Wordpress?

I am running periodic spiders in Scrapy Cloud and exporting the results to an AWS S3 Bucket. I need to dynamically upload my Wordpress tables with these results and I am currently using TablePress plugin which has a "Import tables" option but it…

wordpress amazon-s3 scrapy scrapinghub

asked Mar 26 '19 at 15:32

Jorge Garcia

117
9

1

vote

1 answer

Scrapinghub Getting Error caught on signal handler:

I have a scrapy script that works locally, but when I deploy it to Scrapinghub, it's giving all errors. Upon debugging, the error is coming from Yielding the item. This is the error I get. ERROR [scrapy.utils.signal] Error caught on signal…

python scrapy web-crawler scrapinghub

asked Jan 29 '19 at 11:02

Bashar Abdullah

1,545
1
16
27

1

vote

1 answer

Connection was refused by other side: 111: Connection refused

I have a spider for LinkedIn. It is working fine on my local machine but when I deploy on Scrapinghub I got error: Error downloading : Connection was refused by other side: 111: Connection refused. The complete log of…

python scrapy scrapinghub

asked Aug 30 '18 at 13:27

Alpha Romeo

84
10

1

vote

1 answer

scrapy request duration gradualy higher when scraping lots of different domains on scrappinghub

I'm using scrapy, on scrappinghub, to scrap a few thousands websites. When scraping a single website, requests durations are kept pretty short (< 100ms). But I also have a spider that is responsible for 'validating' around 10k urls (I'm testing a…

scrapy scrapinghub

asked Aug 29 '18 at 23:47

romain-lavoix

403
2
6
20

1

vote

1 answer

Keyword async error running shub command

I got my spiders ready, and now I want to deploy them to scrapinghub. I've succesfully installed shub running pip3 install shub. Im using python 3.7. But when I run shub, I get a syntax error. I can see that this issue should be fixed in the latest…

python-3.x scrapinghub

asked Aug 14 '18 at 18:21

jonask

679
2
6
21

1

vote

0 answers

Scrapinghub/Splash website page fetching time increasing exponentially with parellel thread

In my trial, I hit splash instance with 50 parallel threads. Each thread will get the page source of the URL. My splash instance default slots value is 50. Here, website fetching time increases exponentially with the number of parallel threads. I…

webkit qtwebkit scrapy-splash scrapinghub splash-js-render

asked Jul 24 '18 at 08:02

Mideen abdul gaffoor

81
1
7

1

vote

0 answers

Scrapinghub/Splash - Aquarium is not working on docker-compose

We are trying to use Aquarium, to set up the Scrapinghub/Splash. While installation, when I use "docker-compose up" to start the Splash, It throws the exception Traceback (most recent call last): File "/usr/local/bin/docker-compose", line 11,…

docker docker-compose scrapinghub splash-js-render

asked Jul 23 '18 at 13:34

Mideen abdul gaffoor

81
1
7

1

vote

1 answer

Scrapy and Splash right settings but still got Connection error

Under my settings.py SPLASH_URL = 'http://127.0.0.1:8050' DOWNLOADER_MIDDLEWARES = { 'scrapy_splash.SplashCookiesMiddleware': 723, 'scrapy_splash.SplashMiddleware': 725, 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware':…

python scrapy splash-screen scrapy-splash scrapinghub

asked Jul 10 '18 at 07:31

user1441797

134
1
1
10

Questions tagged [scrapinghub]