Questions tagged [scrapinghub]

a web scraping development and services company, supplies cloud-based web crawling platforms.

179 questions
2
votes
0 answers

print data on a sample image but the text is going out of image

ALL of the data in the picture is taken from an excel sheet, size of the image is 220×320, I cant increase the image size. But I want Line 4 data that is exiting the image to go down below row 4. import pandas as pd df =…
2
votes
0 answers

Is it possible to use a monitor on a script if it fails?

I use scrapinghub to run my spiders. I have a FinishReasonMonitor that slacks me if a spider fails. Is it possible to apply this to a script? My spiders rarely fail, but my scripts occasionally do. In scrapinghub it shows script outcomes as…
weston6142
  • 181
  • 14
2
votes
1 answer

Data flow template cant be created because Scrapinghub Client Library doesn't accept ValueProvider

I'm trying to create a data flow template that can be called from a cloud function that is triggered by a pubsub message. The pubsub message sends a job id from Scrapinghub (a platform for scrapy scrapers), to a cloud function that triggers a data…
pa-nguyen
  • 417
  • 1
  • 5
  • 16
2
votes
0 answers

How to use crawlera proxies in selenium

I have a selenium project. I am going to use Crawlera proxy in selenium. I have already an API Key of Crawlera. headless_proxy = "127.0.0.1:3128" proxy = Proxy({ 'proxyType': ProxyType.MANUAL, 'httpProxy':…
2
votes
3 answers

How to scrape a large amount (>800) Google My Maps location data ("Details from Google Maps") using Web Scraper or other alternatives?

I tried to use Web Scraper, but it only works for a few data entries not for hundreds of data points. Is there a way to scrape a large amount of data solely using Web Scraper or is there a better alternative like python? I intend to scrape…
cgybb
  • 59
  • 2
2
votes
1 answer

Scrapinghub Deploy Failed

I am trying to deploy a project to scrapinghub and here's the error I am getting slackclient 1.3.2 has requirement websocket-client<0.55.0,>=0.35, but you have websocket-client 0.57.0. Warning: Pip checks failed, please fix the conflicts. WARNING:…
2
votes
1 answer

Scrapinghub plugs my results in the log and not in item

I have a functioning spider project to extract urls content (no css). I crawled several set of data and stored them in a series of .csv files. Now I try to set it up to work on Scrapinghub in order to go for a long run scraping. So far, I am able to…
Freddy
  • 73
  • 8
2
votes
1 answer

scrapinghub upload and use file

I uploaded my spider on scrapyhub. I understand how to upload with my *.txt file, but how do I use it? My setup.py file looks like setup( name = 'project', version = '1.0', packages = find_packages(), …
2
votes
1 answer

Export Scrapy JSON Feed - Fails for Dynamic FEED_URI for AWS S3 using ScrapingHub

I have written a scrapy scraper that writes data out using the JsonItemExporter and I have worked out how to export this data to my AWS S3 using the following Spider Settings in ScrapingHub AWS_ACCESS_KEY_ID =…
David Cruwys
  • 6,262
  • 12
  • 45
  • 91
2
votes
1 answer

How can I run Scrapyd on a server

As of recently Scrapinghub no longer has periodic jobs in their free package, which is what I used to use to run my Scrapy crawlers. Therefore, I decided to use Scrapyd instead. So I went ahead and got a virtual server running Ubuntu 16.04. (This is…
Sebastian
  • 831
  • 2
  • 13
  • 36
2
votes
1 answer

Scrapy splash download file from js click event

I'm using scrapy + splash plugin, I have a button which triggers a download event via ajax, I need to get the downloaded file, but don't know how. My lua script is something like this function main(splash) …
delpo
  • 210
  • 2
  • 18
2
votes
0 answers

Scrapy project to Scrapinghub fails

My scrapy projects works fine on my local machine. But, I'm getting an error when deploying to Scrapinghub: $ shub deploy Packing version 88e88d8-master Deploying to Scrapy Cloud project "8888888" Deploy log last 30 lines: File…
Zin Yosrim
  • 1,602
  • 1
  • 22
  • 40
2
votes
1 answer

How to connect Scrapy spider deployed on Scrapinghub to a remote SQL server with SQLAlchemy and pyodbc?

After trying to solve this problem on my own I need some help or nudge in right direction. I wrote and deployed Scrapy spider on Scrapinghub. This spider collects some data and after finish saves that data to remote Microsoft SQL Server. I use…
Vlad
  • 348
  • 3
  • 10
2
votes
2 answers

scrapinghub: Download all items from all completed jobs

I am using scrapinghub for quite a while. I have some spiders that run a job every day. Each weekend I sign in to collect the scraped data. So I end up having to open one spider one over seven jobs at a time, download the data and move to the next,…
2
votes
1 answer

How to pass data to scrapinghub?

I'm trying to run a scrapy spider on scrapinghub, and I want to pass in some data. I'm using their API to run the spider: http://doc.scrapinghub.com/api/jobs.html#jobs-run-json They have an option for job_settings, which seems relevant, but I can't…
Sam Lee
  • 9,913
  • 15
  • 48
  • 56
1
2
3
11 12