Questions tagged [scrapinghub]

a web scraping development and services company, supplies cloud-based web crawling platforms.

179 questions
0
votes
0 answers

ScrapinghubClient > Download CSV

I have a question about using ScrapingHub via ScrapinghubClient. Is there any way to downlaod a csv file from all completed jobs and then delete them directly from python? Thank you!
0
votes
0 answers

Spider returns different results from local machine and Scrapy Cloud(phantomjs+selenium+crawlera)

Hello! Question to one who use scrapinghub, shub-image, selenuim+phantomjs, crawlera. English skill is not good, sorry I needed to scrape site which have many JS code. So I use scrapy+selenium. Aslo it should run at Scrapy Cloud. I've writtеn spider…
kzr
  • 41
  • 1
  • 5
0
votes
1 answer

Scrapinghub: Dict_key error handling | check if key exist

It took me awhile to make sense of python-scrapinghubs logic an the way it interacts with Scrapinghubs API but if made progress in my current troubleshoot... Utilizing Scrapy, I have list multiple web scrapers that the sole function is to create…
scriptso
  • 677
  • 4
  • 14
0
votes
1 answer

How to use peewee with scrapinghub

I want to save my data to remote machine by using peewee. When i run my crawler i found following error, File "/usr/local/lib/python2.7/site-packages/scrapy/commands/crawl.py", line 57, in run self.crawler_process.crawl(spname, **opts.spargs) …
yasirnazir
  • 1,133
  • 7
  • 12
0
votes
1 answer

python-scrapinghub, ascii / utf8?

Python 3.4.2 i'm using the Client interface for Scrapinghub API, which can be found here: https://github.com/scrapinghub/python-scrapinghub I Scrape a site and want get and print the items with for item in job.items(): print(item) In a python…
fuser60596
  • 1,087
  • 1
  • 12
  • 26
0
votes
1 answer

ValueError: Missing scheme in request url: h

I am a beginner in scrapy, python. I tried to deploy the spider code in scrapinghub and I encountered the following error. Below is the code. import scrapy from bs4 import BeautifulSoup,SoupStrainer import urllib2 from scrapy.selector import…
Niveram
  • 15
  • 1
  • 10
0
votes
1 answer

Unable to deploy project to Scrapy Cloud

I made changes to the spider to use some methods of the scrapinghub API and tried re-deploying it to Scrapy Cloud using "shub deploy". I'm getting an error: ImportError: No module named scrapinghub It points to the import line in the spider from…
Zaky
  • 369
  • 6
  • 21
0
votes
0 answers

How to edit files in docker for scrapinghub portia

I have created a pipeline to store the crawled items in JSON file and added the pipeline to the path /slybot/slybot/mypipeline.py After that I have installed Portia package using docker. Installation successful. Then I have started portia using the…
Prabhakar
  • 1,138
  • 2
  • 14
  • 30
0
votes
2 answers

Web scraping from multiple tables appearing on click

Basically I would like to open this page, select "Rüzgar" from the last dropdown, run the query with "Sorgula" button and extract all the coordinates stored in the table appearing once clicked the first button of the first column in the main table.…
Sam
  • 41
  • 6
0
votes
1 answer

Deploying egg on Scrapinghub

I Deployed a project on the scraping-hub but my spider isn't working because scraping-hub uses an old version of twisted library. The project is working fine on my local machine, Is there anyway that i could make an egg of the twisted updated…
Waqar
  • 93
  • 1
  • 2
  • 9
0
votes
1 answer

How to annotate same text for different fields in Portia?

I want to annotate the content which has three lines in three individual fields which is in single html tag. I tried with partial annotation method. But some content has only 2 lines(partial annotation is not working in this scenario) How Can I…
Prabhakar
  • 1,138
  • 2
  • 14
  • 30
0
votes
0 answers

How fields are storing in list in Portia crawl?

EDIT: I am seeing that while running Portia spider the extracted fields are storing in a python variable list[] and returning the values while logging the extracted details in scrapyd. I just want to know that how the fields are being extracted and…
Prabhakar
  • 1,138
  • 2
  • 14
  • 30
0
votes
1 answer

splash (/scrapinghub) - wait = max 10

I am using scrapinghubs splash for rendering javascript pages. It is really a great tool, but I don't understand why the maximum value for wait is 10. Is there a possibility to set higher values? Thank you very much. Best regards, Julian
0
votes
2 answers

scrapy access to log count while running in scrapinghub

I have a small scrapy extension which looks into the stats object of a crawler and sends me an email if the crawler has thrown log messages of a certain type (e.g. WARNING, CRITICAL, ERROR). These stats are accessible by the spiders stats object…
tony994
  • 485
  • 1
  • 5
  • 10
0
votes
1 answer

How to write rejax and xpath for the below link?

Here is the link https://www.google.com/about/careers/search#!t=jo&jid=34154& which i have to extract content under job details. Job details Team or role: Software Engineering // How to write xapth Job type: Full-time // How to write…
user3996896
1 2 3
11
12