Highest Voted 'scrapinghub' Questions

0

votes

0 answers

ScrapinghubClient > Download CSV

I have a question about using ScrapingHub via ScrapinghubClient. Is there any way to downlaod a csv file from all completed jobs and then delete them directly from python? Thank you!

python web-scraping scrapinghub

asked May 12 '17 at 11:49

errorLogger

99
8

0

votes

0 answers

Spider returns different results from local machine and Scrapy Cloud(phantomjs+selenium+crawlera)

Hello! Question to one who use scrapinghub, shub-image, selenuim+phantomjs, crawlera. English skill is not good, sorry I needed to scrape site which have many JS code. So I use scrapy+selenium. Aslo it should run at Scrapy Cloud. I've writtеn spider…

selenium scrapy phantomjs scrapinghub

asked Apr 20 '17 at 03:30

kzr

41
1
5

0

votes

1 answer

Scrapinghub: Dict_key error handling | check if key exist

It took me awhile to make sense of python-scrapinghubs logic an the way it interacts with Scrapinghubs API but if made progress in my current troubleshoot... Utilizing Scrapy, I have list multiple web scrapers that the sole function is to create…

python python-3.x dictionary scrapinghub

asked Apr 18 '17 at 02:59

scriptso

677
4
14

0

votes

1 answer

How to use peewee with scrapinghub

I want to save my data to remote machine by using peewee. When i run my crawler i found following error, File "/usr/local/lib/python2.7/site-packages/scrapy/commands/crawl.py", line 57, in run self.crawler_process.crawl(spname, **opts.spargs) …

python scrapy scrapinghub

asked Apr 15 '17 at 08:40

yasirnazir

1,133
7
12

0

votes

1 answer

python-scrapinghub, ascii / utf8?

Python 3.4.2 i'm using the Client interface for Scrapinghub API, which can be found here: https://github.com/scrapinghub/python-scrapinghub I Scrape a site and want get and print the items with for item in job.items(): print(item) In a python…

python utf-8 python-3.4 scrapinghub

asked Mar 17 '17 at 14:59

fuser60596

1,087
1
12
26

0

votes

1 answer

ValueError: Missing scheme in request url: h

I am a beginner in scrapy, python. I tried to deploy the spider code in scrapinghub and I encountered the following error. Below is the code. import scrapy from bs4 import BeautifulSoup,SoupStrainer import urllib2 from scrapy.selector import…

python-2.7 scrapy scrapinghub

asked Feb 14 '17 at 07:15

Niveram

15
1
10

0

votes

1 answer

Unable to deploy project to Scrapy Cloud

I made changes to the spider to use some methods of the scrapinghub API and tried re-deploying it to Scrapy Cloud using "shub deploy". I'm getting an error: ImportError: No module named scrapinghub It points to the import line in the spider from…

python-2.7 scrapy scrapinghub

asked Dec 28 '16 at 16:02

Zaky

369
6
21

0

votes

0 answers

How to edit files in docker for scrapinghub portia

I have created a pipeline to store the crawled items in JSON file and added the pipeline to the path /slybot/slybot/mypipeline.py After that I have installed Portia package using docker. Installation successful. Then I have started portia using the…

python docker docker-container portia scrapinghub

asked Nov 24 '16 at 12:13

Prabhakar

1,138
2
14
30

0

votes

2 answers

Web scraping from multiple tables appearing on click

Basically I would like to open this page, select "Rüzgar" from the last dropdown, run the query with "Sorgula" button and extract all the coordinates stored in the table appearing once clicked the first button of the first column in the main table.…

web-scraping scrapy portia scrapinghub

asked Mar 23 '16 at 18:58

Sam

41
6

0

votes

1 answer

Deploying egg on Scrapinghub

I Deployed a project on the scraping-hub but my spider isn't working because scraping-hub uses an old version of twisted library. The project is working fine on my local machine, Is there anyway that i could make an egg of the twisted updated…

python scrapy web-crawler scrapinghub

asked Feb 17 '16 at 10:10

Waqar

93
1
2
9

0

votes

1 answer

How to annotate same text for different fields in Portia?

I want to annotate the content which has three lines in three individual fields which is in single html tag. I tried with partial annotation method. But some content has only 2 lines(partial annotation is not working in this scenario) How Can I…

python annotations portia scrapinghub

asked Sep 29 '15 at 13:51

Prabhakar

1,138
2
14
30

0

votes

0 answers

How fields are storing in list in Portia crawl?

EDIT: I am seeing that while running Portia spider the extracted fields are storing in a python variable list[] and returning the values while logging the extracted details in scrapyd. I just want to know that how the fields are being extracted and…

python scrapyd portia scrapinghub

asked Aug 07 '15 at 11:45

Prabhakar

1,138
2
14
30

0

votes

1 answer

splash (/scrapinghub) - wait = max 10

I am using scrapinghubs splash for rendering javascript pages. It is really a great tool, but I don't understand why the maximum value for wait is 10. Is there a possibility to set higher values? Thank you very much. Best regards, Julian

python splash-screen scrapinghub

asked Mar 30 '15 at 08:09

Julian Baehr

27
4

0

votes

2 answers

scrapy access to log count while running in scrapinghub

I have a small scrapy extension which looks into the stats object of a crawler and sends me an email if the crawler has thrown log messages of a certain type (e.g. WARNING, CRITICAL, ERROR). These stats are accessible by the spiders stats object…

python scrapy scrapinghub

asked Dec 02 '14 at 21:09

tony994

485
1
5
10

0

votes

1 answer

How to write rejax and xpath for the below link?

Here is the link https://www.google.com/about/careers/search#!t=jo&jid=34154& which i have to extract content under job details. Job details Team or role: Software Engineering // How to write xapth Job type: Full-time // How to write…

python-2.7 xpath css-selectors scrapy scrapinghub

asked Nov 13 '14 at 08:03

user3996896

Questions tagged [scrapinghub]