Highest Voted 'scrapinghub' Questions

0

votes

0 answers

What is the correct way to write a file on Scrapinghub?

I use Python-Scrapy and Scrapinghub. In my spider I should read and write a file data_directory = 'tmp' csv_magasin = data_directory+"/"+current_script+"_"+current_date+"-shop_url.csv" if not os.path.exists(data_directory): …

python scrapy scrapinghub

asked Jan 23 '18 at 11:17

parik

2,313
12
39
67

0

votes

0 answers

Scrapinghub Crawled 0 pages (at 0 pages/min)

I have developed a simple scrapy project to crawl a website. The crawler works fine on my local machine but when I try to deploy it to Scrapy cloud provided by scrapinghub.com the spider shows 0 pages crawled and after 180 sec (default timeout) it…

python python-3.x scrapy scrapinghub

asked Dec 04 '17 at 12:08

shubham003

703
2
9
20

0

votes

1 answer

Scrapy how to save a State between spider runs (via scrapinghub)?

I have a spider that will run on schedule. Spider input is based on Date. From date of last scrape to todays date. So the question is how to save the date of last scrape within the Scrapy project? There is an option to get data from scrapy settings…

python-2.7 scrapy scrapinghub

asked Nov 25 '17 at 16:17

Billy Jhon

1,035
15
30

0

votes

0 answers

Error caught on signal handler:TypeError: to_bytes must receive a unicode got instance

Getting this strange error when I run my code in scrapy cloud. Not sure how to debug it. There is no reference to line in the spider code. I assume it is about saving an item and smth general as no url is indicated. Moreover the spider runs ok and…

python-2.7 scrapy scrapinghub

asked Nov 11 '17 at 21:33

Billy Jhon

1,035
15
30

0

votes

1 answer

Dependency error while trying to run project on Scrapy Cloud

I create a project with scrapy and using pymongo save my data to mongodb. I have checked my pymongo version is 3.5.1 When i deploy my project to scrapinghub and run it. It shows error on scrapinghub exceptions.ImportError: No module named pymongo I…

scrapinghub

asked Nov 05 '17 at 03:47

Morton

5,380
18
63
118

0

votes

2 answers

Cant install MySQLdb-python==1.2.5 Scrapinghub (Scrapy) Python 2.7

I read some threads about connecting Mysql with scrapinghub deployed script. They reccomend to change *.yml file and add requirements txt. This solution worked few days ago. Now it doesnt. Here is error from Shub Deploy. Collecting…

python-2.7 scrapy scrapinghub

asked Nov 03 '17 at 09:48

Billy Jhon

1,035
15
30

0

votes

1 answer

Update start urls at scrapinghub hosted Scrapy project via API call

My Scrapy spider is hosted at scrapinghub. It is managed via run spider API call. The only thing that changes in spider from call to call is a list of start urls. The list may vary from 100 urls to couple thousand. What is the best way to update…

python api scrapy scrapinghub

asked Nov 01 '17 at 07:56

Billy Jhon

1,035
15
30

0

votes

1 answer

text substitution {} does not work at scrapinghub

I create a url with {} format to change the url on the fly. It works totally fine on my PC. But once I upload and run it from scrapinghub one(state) of the many substitutions(others work fine) does not work, it returns %7B%7D& in the url which is…

python scrapy scrapinghub

asked Oct 23 '17 at 11:58

Billy Jhon

1,035
15
30

0

votes

1 answer

How to use pip to install middleware on Scrapinghub

I have a scrapy project that use middleware install via pip. More specifically scrapy-random-useragent. Setting file # -- coding: utf-8 -- # Scrapy settings for batdongsan project # # For simplicity, this file contains only settings considered…

python scrapy pip scrapinghub

asked Sep 02 '17 at 19:01

Haha TTpro

5,137
6
45
71

0

votes

0 answers

Cannot import ScrapinghubClient

>>> from scrapinghub import ScrapinghubClient Traceback (most recent call last): File "", line 1, in ImportError: cannot import name ScrapinghubClient Why is this happening? I have Python 2.7.13 |Continuum Analytics, Inc.| (default, May 11…

python scrapy scrapinghub

asked Aug 21 '17 at 07:31

errorLogger

99
8

0

votes

1 answer

Scrapinghub job failed - can't diagnose

The spider stopped in the middle of the crawl (after 7h run, 20K requests). The job status is "failure". Even though there are no ERROR messages in the log. The log look like the code just stopped running on a particular code line range without any…

scrapinghub

asked Jul 30 '17 at 20:25

noname7619

3,370
3
21
26

0

votes

1 answer

Scrapy: Redirecting to a confirmation page with a captcha

How can I stop redirecting from a target url to another url which is a confirmation page of a website with a captcha? Here is my code below: yield scrapy.Request(meta={'handle_httpstatus_list': [302], 'dont_redirect': True,…

redirect web-scraping scrapy scrapinghub

asked Jul 27 '17 at 10:03

RF_956

329
2
7
18

0

votes

1 answer

ScrapingHub: ImportError: No module named firebase

I'm trying to put my scraped data on my firebase account on cloud , but i'm getting this ImportError when i run the spider. I tried making new project and even reinstalling the firebase and shub on specific version of Python but no help. the spider…

python firebase scrapy scrapinghub

asked Jul 03 '17 at 17:27

P.hunter

1,345
2
21
45

0

votes

1 answer

How to extract files from ScrapingHub?

I have deployed some Scrapy spiders to scrape data which I can download in .csv from ScrapingHub. Some of these spiders have FilePipeline which I used to download files (pdf) to a specific folder. Is there any way I can retrieve these files from…

python scrapy web-crawler scrapinghub

asked Jun 15 '17 at 03:46

graph

77
4

0

votes

0 answers

scrapy script stops after certain requests

I have a scrapy script running on scrapinghub. The scraper takes one argument as a csv file where the urls have been stored. The script runs without error, but the problem is that it isn't scraping all the items from the url. I have no idea why this…

python web-scraping scrapy scrapinghub

asked May 31 '17 at 07:27

sinhyperbolica

73
9

Questions tagged [scrapinghub]