a web scraping development and services company, supplies cloud-based web crawling platforms.
Questions tagged [scrapinghub]
179 questions
0
votes
1 answer
Unable to parse selector?
I am currently using scrapy + splash + python in centos. I have write following code for extracting content from here.
Unable to extract data from the java script popup windows for example "href="javascript:void(0);" any one guide me to…
user3996896
0
votes
0 answers
Can't scrape particular websites using Scrapinghub
I am using the autoscraping feature in the scrapinghub service.
While building and deploying the autoscraper, I found that the site I wanted to scrape would never return any Requests, and would time out around 3.5 minutes.
So, I began reading the…

tumultous_rooster
- 12,150
- 32
- 92
- 149
0
votes
1 answer
portia (scrapy/slybot) errors on windows
i installed portia and got it to work i annotated some websites (looks really good)
but when i try to run the spiders i get some errors and nothing gets crawled
im running python 2.7.6 on win 7
C:\Python27\Scripts>python portiacrawl…

f.almeida
- 3
- 3
0
votes
1 answer
Perform Login fail on some website with ScrapingHub's Dash
When I try to perform login with ScrapingHub's Dash, I get the following error on some web site into the "log" section :
scraping hub exceptions.KeyError: 'No input element with the name None'
How to fix hit ?
EDIT : Here the authentication method…

Snite
- 169
- 5
- 19
0
votes
1 answer
Login to a website then collect data with Scraping Hub
I've used scrapinghub for two days and am looking for how to log in into a website then scrape data. I see this topic but can't see how to apply it into the Dash.
http://blog.scrapinghub.com/2012/10/26/filling-login-forms-automatically/
Could you…

Snite
- 169
- 5
- 19
-1
votes
1 answer
Can't fetch url in scrapy shell with splash
Please help me!
When I try to fetch a URL in scrapy shell with scrapy splash, I use the following statement to get a response:
>>> fetch('http://localhost:8050/render.html?url=https://www.barbiermotorsport.nl/motoren')
So far I'm not getting a…

sejteN
- 9
- 2
-1
votes
1 answer
Is it possible to create a proxy failover with Python Scrapy?
Is it possible to create a proxy failover within Scrapy, so that when one fails the other will take over scraping the rest of the requests? I would of thought that it would be done using the retry middleware, but I don't really have a clue how to…

webbie1985
- 1
- 2
-1
votes
1 answer
How to get the XPATH or CSS selector from dynamically loaded website to follow links?
This is a dynamically-loaded website https://www.gelbeseiten.de/suche/hotels/n%c3%bcrnberg.
I'm trying to follow every link from the results. I found //article[@class='mod mod-Treffer']/a to follow the search result links. But the problem is this…

Raisul Islam
- 277
- 2
- 19
-1
votes
1 answer
How can I scrape a button that does not return a value
I am trying to scrape from the website https://tonaton.com/en/ads/ghana/electronics. There is a "next" button that I want to click and scrape the contents. The problem is the xpath or css selector of that button does not return any value in neither…

Danny Stringz
- 9
- 3
-1
votes
2 answers
Can't deploy to ScrapingHub non existent SyntaxError: invalid syntax
I have a Scrapy Spider that runs perfectly if I call: scrapy crawl .
When I try to deploy it to ScrapingHub.com it raises a SyntaxError that I can't fix. I can't figure out whats happening.
There in no syntax error in my code.
here is my deployment…

Omar Omeiri
- 1,506
- 1
- 17
- 33
-1
votes
1 answer
How to properly pass arguments to scrapy spider on scrapinghub?
I am trying to pass paramters to my spider (ideally a Dataframe or csv) with:
self.client = ScrapinghubClient(apikey)
self.project = self.client.get_project()
job = spider.jobs.run()
I tried using the *args and **kwargs argument type but each time…

Emilz
- 73
- 1
- 8
-1
votes
1 answer
unable to scrape myntra API data using scrapy framework 307 redirect error
Below is the spider code:
import scrapy
class MyntraSpider(scrapy.Spider):
custom_settings = {
'HTTPCACHE_ENABLED': False,
'dont_redirect': True,
#'handle_httpstatus_list' : [302,307],
#'CRAWLERA_ENABLED':…

Suruchi Babbar
- 26
- 4
-1
votes
1 answer
Scrapinghub deployment error: non-exit status 1
i get this error mesage when I try to deploy my project and I really do not understand why:error log

Clément
- 21
- 5
-2
votes
2 answers
Is there any alternative for \ in f string in python?
So I am scraping this website with link : https://www.americanexpress.com/in/credit-cards/payback-card/
using beautiful soup and python.
link = 'https://www.americanexpress.com/in/credit-cards/payback-card/'
html = urlopen(link)
soup =…
user15197314