Questions tagged [scrapinghub]

a web scraping development and services company, supplies cloud-based web crawling platforms.

179 questions
0
votes
3 answers

(Scrapy) How do you scrape all the external links on each website from a list of hundreds of websites (and run the whole thing on Zyte)?

I am looking for some help regarding my Scrapy project. I want to use Scrapy to code a generic Spider that would crawl multiple websites from a list. I was hoping to have the list in a separate file, because it's quite large. For each website, the…
Alban
  • 21
  • 4
0
votes
2 answers

Scrapinghub scrapy: ModuleNotFoundError: No module named 'pandas'

I have tried deploying to Zyte via command line and GitHub but I have been stuck with the above error. I have tried different versions of Scrapy version 1.5 to 2.5 but the error still persists. I have also tried setting my Scrapinghub.yml to the…
chuky pedro
  • 756
  • 1
  • 8
  • 26
0
votes
1 answer

Scrapinghub/Zyte: Unhandled error in Deferred: No module named 'scrapy_user_agents'

I'm deploying my Scrapy spider via my local machine to Zyte Cloud (former ScrapingHub). This is successful. When I run the spider I get the output below. I already checked here. The Zyte team is not very responsive on their own site it seems, but…
Adam
  • 6,041
  • 36
  • 120
  • 208
0
votes
1 answer

How to scrape card details using beautiful soup and python

I am trying to scrape this link : https://www.axisbank.com/retail/cards/credit-card Using the following code from urllib.request import urlopen from bs4 import BeautifulSoup import json, requests, re axis_url =…
user15215612
0
votes
1 answer

How to specify css selector in beautiful soup and python?

I am trying to scrape the titles of the cards from this link : https://www.axisbank.com/retail/cards/credit-card Using the below code from urllib.request import urlopen from bs4 import BeautifulSoup import json, requests, re axis_url =…
user15215612
0
votes
2 answers

How to scrape the heading and description using python and beautiful soup?

Overview of the problem: Link : https://www.bobfinancial.com/eterna.jsp In the Details Section: Basically I want all Points. details: [ #This is an array of Strings... "Milestone Rewards: Earn 10,000 bonus reward points on spending ₹ 50,000…
0
votes
0 answers

Scrapy project working fine locally but not returning anything on scrapinghub, and not even showing any error

import json from json.decoder import JSONDecodeError import requests from ..items import PlayerstatisticsItem from scrapy.spiders import CrawlSpider import scrapy class PlayerStatsSpider(CrawlSpider): name = 'player_stats' def start_requests(self): …
0
votes
0 answers

Scrapinghub (Scrapy Cloud) - Is there a limit to the number of spiders per project?

I am using scrapy cloud to host & run my scrapy project. All seems to work well but when 11th spider is added to the project, it doesn't show up in the scrapinghub dashboard. I know this should be a rather simple thing to find out but it isn't…
0
votes
1 answer

Scrapy Error : Missing scheme in request url

I am facing issues with some urls while running scrappy ValueError: Missing scheme in request url: mailto:?body=https%3A%2F%2Fiview.abc.net.au%2Fshow%2Finsiders [scrapy.core.scraper:168|ERROR] Spider error processing
0
votes
0 answers

Where to put logs in Scrapinghub? Scrapy

I am trying to put my logs in a "logs" folder but when I try to deploy to Scrapy I get No such file or directory: '/scrapinghub/sfb/logs/random_log.log' but I think I am declaring it correctly in the setup.py file. What am I doing wrong here? File…
weston6142
  • 181
  • 14
0
votes
1 answer

No module found named toplevelfolder when importing github Scrapy project into Scrapinghub

When I am trying to import my Scrapy project onto Scrapinghub using the website through GitHub I get this error: ModuleNotFoundError: No module named 'sfb' Here is my project…
weston6142
  • 181
  • 14
0
votes
0 answers

How to use one crawler for multiple domains?

I'm working on a project that involves crawling multiple domains in their entirety. My scraper simply crawls the whole domain and doesn't check specific parts of the html, it just gets all the html. For some domains, I would only want to crawl one…
Jake 1986
  • 582
  • 1
  • 6
  • 25
0
votes
0 answers

Failed to Get Scrapinghub Notification on Slack. Facing Error

I want to get Scrapinghub Spiders All (e.g Spider Run,Completion,Error) Notification on Slack. I have created Monitor.py file and Action.py file and also added spidermoon slack settings in my settings.py file. SPIDERMON_SLACK_SENDER_TOKEN =…
0
votes
1 answer

Unable to access scrapyd interface on the server machine with public IP

I am trying to run scrapyd my ubuntu server which has a public IP using the following config file named scrapy.cfg [settings] default = web_crawler.settings [deploy:default] url = http://127.0.0.1:6800/ project = web_crawler [scrapyd] eggs_dir =…
Amanda
  • 2,013
  • 3
  • 24
  • 57