Questions tagged [scrapinghub]

a web scraping development and services company, supplies cloud-based web crawling platforms.

179 questions
0
votes
1 answer

Unable to parse selector?

I am currently using scrapy + splash + python in centos. I have write following code for extracting content from here. Unable to extract data from the java script popup windows for example "href="javascript:void(0);" any one guide me to…
user3996896
0
votes
0 answers

Can't scrape particular websites using Scrapinghub

I am using the autoscraping feature in the scrapinghub service. While building and deploying the autoscraper, I found that the site I wanted to scrape would never return any Requests, and would time out around 3.5 minutes. So, I began reading the…
tumultous_rooster
  • 12,150
  • 32
  • 92
  • 149
0
votes
1 answer

portia (scrapy/slybot) errors on windows

i installed portia and got it to work i annotated some websites (looks really good) but when i try to run the spiders i get some errors and nothing gets crawled im running python 2.7.6 on win 7 C:\Python27\Scripts>python portiacrawl…
f.almeida
  • 3
  • 3
0
votes
1 answer

Perform Login fail on some website with ScrapingHub's Dash

When I try to perform login with ScrapingHub's Dash, I get the following error on some web site into the "log" section : scraping hub exceptions.KeyError: 'No input element with the name None' How to fix hit ? EDIT : Here the authentication method…
Snite
  • 169
  • 5
  • 19
0
votes
1 answer

Login to a website then collect data with Scraping Hub

I've used scrapinghub for two days and am looking for how to log in into a website then scrape data. I see this topic but can't see how to apply it into the Dash. http://blog.scrapinghub.com/2012/10/26/filling-login-forms-automatically/ Could you…
Snite
  • 169
  • 5
  • 19
-1
votes
1 answer

Can't fetch url in scrapy shell with splash

Please help me! When I try to fetch a URL in scrapy shell with scrapy splash, I use the following statement to get a response: >>> fetch('http://localhost:8050/render.html?url=https://www.barbiermotorsport.nl/motoren') So far I'm not getting a…
sejteN
  • 9
  • 2
-1
votes
1 answer

Is it possible to create a proxy failover with Python Scrapy?

Is it possible to create a proxy failover within Scrapy, so that when one fails the other will take over scraping the rest of the requests? I would of thought that it would be done using the retry middleware, but I don't really have a clue how to…
-1
votes
1 answer

How to get the XPATH or CSS selector from dynamically loaded website to follow links?

This is a dynamically-loaded website https://www.gelbeseiten.de/suche/hotels/n%c3%bcrnberg. I'm trying to follow every link from the results. I found //article[@class='mod mod-Treffer']/a to follow the search result links. But the problem is this…
Raisul Islam
  • 277
  • 2
  • 19
-1
votes
1 answer

How can I scrape a button that does not return a value

I am trying to scrape from the website https://tonaton.com/en/ads/ghana/electronics. There is a "next" button that I want to click and scrape the contents. The problem is the xpath or css selector of that button does not return any value in neither…
-1
votes
2 answers

Can't deploy to ScrapingHub non existent SyntaxError: invalid syntax

I have a Scrapy Spider that runs perfectly if I call: scrapy crawl . When I try to deploy it to ScrapingHub.com it raises a SyntaxError that I can't fix. I can't figure out whats happening. There in no syntax error in my code. here is my deployment…
Omar Omeiri
  • 1,506
  • 1
  • 17
  • 33
-1
votes
1 answer

How to properly pass arguments to scrapy spider on scrapinghub?

I am trying to pass paramters to my spider (ideally a Dataframe or csv) with: self.client = ScrapinghubClient(apikey) self.project = self.client.get_project() job = spider.jobs.run() I tried using the *args and **kwargs argument type but each time…
Emilz
  • 73
  • 1
  • 8
-1
votes
1 answer

unable to scrape myntra API data using scrapy framework 307 redirect error

Below is the spider code: import scrapy class MyntraSpider(scrapy.Spider): custom_settings = { 'HTTPCACHE_ENABLED': False, 'dont_redirect': True, #'handle_httpstatus_list' : [302,307], #'CRAWLERA_ENABLED':…
-1
votes
1 answer

Scrapinghub deployment error: non-exit status 1

i get this error mesage when I try to deploy my project and I really do not understand why:error log
Clément
  • 21
  • 5
-2
votes
2 answers

Is there any alternative for \ in f string in python?

So I am scraping this website with link : https://www.americanexpress.com/in/credit-cards/payback-card/ using beautiful soup and python. link = 'https://www.americanexpress.com/in/credit-cards/payback-card/' html = urlopen(link) soup =…
user15197314
1 2 3
11
12