a web scraping development and services company, supplies cloud-based web crawling platforms.
Questions tagged [scrapinghub]
179 questions
2
votes
0 answers
print data on a sample image but the text is going out of image
ALL of the data in the picture is taken from an excel sheet, size of the image is 220×320, I cant increase the image size.
But I want Line 4 data that is exiting the image to go down below row 4.
import pandas as pd
df =…

Abdurehman Dar
- 87
- 8
2
votes
0 answers
Is it possible to use a monitor on a script if it fails?
I use scrapinghub to run my spiders. I have a FinishReasonMonitor that slacks me if a spider fails. Is it possible to apply this to a script? My spiders rarely fail, but my scripts occasionally do. In scrapinghub it shows script outcomes as…

weston6142
- 181
- 14
2
votes
1 answer
Data flow template cant be created because Scrapinghub Client Library doesn't accept ValueProvider
I'm trying to create a data flow template that can be called from a cloud function that is triggered by a pubsub message. The pubsub message sends a job id from Scrapinghub (a platform for scrapy scrapers), to a cloud function that triggers a data…

pa-nguyen
- 417
- 1
- 5
- 16
2
votes
0 answers
How to use crawlera proxies in selenium
I have a selenium project. I am going to use Crawlera proxy in selenium. I have already an API Key of Crawlera.
headless_proxy = "127.0.0.1:3128"
proxy = Proxy({
'proxyType': ProxyType.MANUAL,
'httpProxy':…

pystack-piter
- 21
- 1
2
votes
3 answers
How to scrape a large amount (>800) Google My Maps location data ("Details from Google Maps") using Web Scraper or other alternatives?
I tried to use Web Scraper, but it only works for a few data entries not for hundreds of data points. Is there a way to scrape a large amount of data solely using Web Scraper or is there a better alternative like python? I intend to scrape…

cgybb
- 59
- 2
2
votes
1 answer
Scrapinghub Deploy Failed
I am trying to deploy a project to scrapinghub and here's the error I am getting
slackclient 1.3.2 has requirement websocket-client<0.55.0,>=0.35, but you have websocket-client 0.57.0.
Warning: Pip checks failed, please fix the conflicts.
WARNING:…

johncsmith427
- 83
- 8
2
votes
1 answer
Scrapinghub plugs my results in the log and not in item
I have a functioning spider project to extract urls content (no css). I crawled several set of data and stored them in a series of .csv files. Now I try to set it up to work on Scrapinghub in order to go for a long run scraping.
So far, I am able to…

Freddy
- 73
- 8
2
votes
1 answer
scrapinghub upload and use file
I uploaded my spider on scrapyhub. I understand how to upload with my *.txt file, but how do I use it?
My setup.py file looks like
setup(
name = 'project',
version = '1.0',
packages = find_packages(),
…

олег колесник
- 23
- 3
2
votes
1 answer
Export Scrapy JSON Feed - Fails for Dynamic FEED_URI for AWS S3 using ScrapingHub
I have written a scrapy scraper that writes data out using the JsonItemExporter and I have worked out how to export this data to my AWS S3 using the following Spider Settings in ScrapingHub
AWS_ACCESS_KEY_ID =…

David Cruwys
- 6,262
- 12
- 45
- 91
2
votes
1 answer
How can I run Scrapyd on a server
As of recently Scrapinghub no longer has periodic jobs in their free package, which is what I used to use to run my Scrapy crawlers.
Therefore, I decided to use Scrapyd instead. So I went ahead and got a virtual server running Ubuntu 16.04. (This is…

Sebastian
- 831
- 2
- 13
- 36
2
votes
1 answer
Scrapy splash download file from js click event
I'm using scrapy + splash plugin, I have a button which triggers a download event via ajax, I need to get the downloaded file, but don't know how.
My lua script is something like this
function main(splash)
…

delpo
- 210
- 2
- 18
2
votes
0 answers
Scrapy project to Scrapinghub fails
My scrapy projects works fine on my local machine. But, I'm getting an error when deploying to Scrapinghub:
$ shub deploy
Packing version 88e88d8-master
Deploying to Scrapy Cloud project "8888888"
Deploy log last 30 lines:
File…

Zin Yosrim
- 1,602
- 1
- 22
- 40
2
votes
1 answer
How to connect Scrapy spider deployed on Scrapinghub to a remote SQL server with SQLAlchemy and pyodbc?
After trying to solve this problem on my own I need some help or nudge in right direction.
I wrote and deployed Scrapy spider on Scrapinghub. This spider collects some data and after finish saves that data to remote Microsoft SQL Server. I use…

Vlad
- 348
- 3
- 10
2
votes
2 answers
scrapinghub: Download all items from all completed jobs
I am using scrapinghub for quite a while. I have some spiders that run a job every day. Each weekend I sign in to collect the scraped data. So I end up having to open one spider one over seven jobs at a time, download the data and move to the next,…

errorLogger
- 99
- 8
2
votes
1 answer
How to pass data to scrapinghub?
I'm trying to run a scrapy spider on scrapinghub, and I want to pass in some data. I'm using their API to run the spider:
http://doc.scrapinghub.com/api/jobs.html#jobs-run-json
They have an option for job_settings, which seems relevant, but I can't…

Sam Lee
- 9,913
- 15
- 48
- 56