a web scraping development and services company, supplies cloud-based web crawling platforms.
Questions tagged [scrapinghub]
179 questions
0
votes
0 answers
What is the correct way to write a file on Scrapinghub?
I use Python-Scrapy and Scrapinghub.
In my spider I should read and write a file
data_directory = 'tmp'
csv_magasin = data_directory+"/"+current_script+"_"+current_date+"-shop_url.csv"
if not os.path.exists(data_directory):
…

parik
- 2,313
- 12
- 39
- 67
0
votes
0 answers
Scrapinghub Crawled 0 pages (at 0 pages/min)
I have developed a simple scrapy project to crawl a website. The crawler works fine on my local machine but when I try to deploy it to Scrapy cloud provided by scrapinghub.com the spider shows 0 pages crawled and after 180 sec (default timeout) it…

shubham003
- 703
- 2
- 9
- 20
0
votes
1 answer
Scrapy how to save a State between spider runs (via scrapinghub)?
I have a spider that will run on schedule. Spider input is based on Date. From date of last scrape to todays date. So the question is how to save the date of last scrape within the Scrapy project? There is an option to get data from scrapy settings…

Billy Jhon
- 1,035
- 15
- 30
0
votes
0 answers
Error caught on signal handler:TypeError: to_bytes must receive a unicode got instance
Getting this strange error when I run my code in scrapy cloud. Not sure how to debug it. There is no reference to line in the spider code.
I assume it is about saving an item and smth general as no url is indicated. Moreover the spider runs ok and…

Billy Jhon
- 1,035
- 15
- 30
0
votes
1 answer
Dependency error while trying to run project on Scrapy Cloud
I create a project with scrapy and using pymongo save my data to mongodb.
I have checked my pymongo version is 3.5.1
When i deploy my project to scrapinghub and run it.
It shows error on scrapinghub exceptions.ImportError: No module named pymongo
I…

Morton
- 5,380
- 18
- 63
- 118
0
votes
2 answers
Cant install MySQLdb-python==1.2.5 Scrapinghub (Scrapy) Python 2.7
I read some threads about connecting Mysql with scrapinghub deployed script.
They reccomend to change *.yml file and add requirements txt.
This solution worked few days ago. Now it doesnt.
Here is error from Shub Deploy.
Collecting…

Billy Jhon
- 1,035
- 15
- 30
0
votes
1 answer
Update start urls at scrapinghub hosted Scrapy project via API call
My Scrapy spider is hosted at scrapinghub. It is managed via run spider API call.
The only thing that changes in spider from call to call is a list of start urls. The list may vary from 100 urls to couple thousand. What is the best way to update…

Billy Jhon
- 1,035
- 15
- 30
0
votes
1 answer
text substitution {} does not work at scrapinghub
I create a url with {} format to change the url on the fly.
It works totally fine on my PC.
But once I upload and run it from scrapinghub one(state) of the many substitutions(others work fine) does not work, it returns %7B%7D& in the url which is…

Billy Jhon
- 1,035
- 15
- 30
0
votes
1 answer
How to use pip to install middleware on Scrapinghub
I have a scrapy project that use middleware install via pip.
More specifically scrapy-random-useragent.
Setting file
# -- coding: utf-8 --
# Scrapy settings for batdongsan project
#
# For simplicity, this file contains only settings considered…

Haha TTpro
- 5,137
- 6
- 45
- 71
0
votes
0 answers
Cannot import ScrapinghubClient
>>> from scrapinghub import ScrapinghubClient
Traceback (most recent call last):
File "", line 1, in
ImportError: cannot import name ScrapinghubClient
Why is this happening?
I have Python 2.7.13 |Continuum Analytics, Inc.| (default, May 11…

errorLogger
- 99
- 8
0
votes
1 answer
Scrapinghub job failed - can't diagnose
The spider stopped in the middle of the crawl (after 7h run, 20K requests). The job status is "failure". Even though there are no ERROR messages in the log. The log look like the code just stopped running on a particular code line range without any…

noname7619
- 3,370
- 3
- 21
- 26
0
votes
1 answer
Scrapy: Redirecting to a confirmation page with a captcha
How can I stop redirecting from a target url to another url which is a confirmation page of a website with a captcha?
Here is my code below:
yield scrapy.Request(meta={'handle_httpstatus_list': [302], 'dont_redirect': True,…

RF_956
- 329
- 2
- 7
- 18
0
votes
1 answer
ScrapingHub: ImportError: No module named firebase
I'm trying to put my scraped data on my firebase account on cloud , but i'm getting this ImportError when i run the spider. I tried making new project and even reinstalling the firebase and shub on specific version of Python but no help.
the spider…

P.hunter
- 1,345
- 2
- 21
- 45
0
votes
1 answer
How to extract files from ScrapingHub?
I have deployed some Scrapy spiders to scrape data which I can download in .csv from ScrapingHub.
Some of these spiders have FilePipeline which I used to download files (pdf) to a specific folder. Is there any way I can retrieve these files from…

graph
- 77
- 4
0
votes
0 answers
scrapy script stops after certain requests
I have a scrapy script running on scrapinghub. The scraper takes one argument as a csv file where the urls have been stored. The script runs without error, but the problem is that it isn't scraping all the items from the url. I have no idea why this…

sinhyperbolica
- 73
- 9