a web scraping development and services company, supplies cloud-based web crawling platforms.
Questions tagged [scrapinghub]
179 questions
0
votes
3 answers
(Scrapy) How do you scrape all the external links on each website from a list of hundreds of websites (and run the whole thing on Zyte)?
I am looking for some help regarding my Scrapy project.
I want to use Scrapy to code a generic Spider that would crawl multiple websites from a list. I was hoping to have the list in a separate file, because it's quite large. For each website, the…

Alban
- 21
- 4
0
votes
2 answers
Scrapinghub scrapy: ModuleNotFoundError: No module named 'pandas'
I have tried deploying to Zyte via command line and GitHub but I have been stuck with the above error.
I have tried different versions of Scrapy version 1.5 to 2.5 but the error still persists.
I have also tried setting my Scrapinghub.yml to the…

chuky pedro
- 756
- 1
- 8
- 26
0
votes
1 answer
Scrapinghub/Zyte: Unhandled error in Deferred: No module named 'scrapy_user_agents'
I'm deploying my Scrapy spider via my local machine to Zyte Cloud (former ScrapingHub). This is successful. When I run the spider I get the output below.
I already checked here. The Zyte team is not very responsive on their own site it seems, but…

Adam
- 6,041
- 36
- 120
- 208
0
votes
1 answer
How to scrape card details using beautiful soup and python
I am trying to scrape this link : https://www.axisbank.com/retail/cards/credit-card
Using the following code
from urllib.request import urlopen
from bs4 import BeautifulSoup
import json, requests, re
axis_url =…
user15215612
0
votes
1 answer
How to specify css selector in beautiful soup and python?
I am trying to scrape the titles of the cards from this link : https://www.axisbank.com/retail/cards/credit-card
Using the below code
from urllib.request import urlopen
from bs4 import BeautifulSoup
import json, requests, re
axis_url =…
user15215612
0
votes
2 answers
Trying to scrape apply now and learn more urls but not able to get it using beautiful soup and python
I am scraping this link :…

Ali Baba
- 85
- 11
0
votes
2 answers
How to scrape the heading and description using python and beautiful soup?
Overview of the problem:
Link : https://www.bobfinancial.com/eterna.jsp
In the Details Section: Basically I want all Points.
details:
[ #This is an array of Strings...
"Milestone Rewards: Earn 10,000 bonus reward points on spending ₹ 50,000…

Ali Baba
- 85
- 11
0
votes
0 answers
Scrapy project working fine locally but not returning anything on scrapinghub, and not even showing any error
import json
from json.decoder import JSONDecodeError
import requests
from ..items import PlayerstatisticsItem
from scrapy.spiders import CrawlSpider
import scrapy
class PlayerStatsSpider(CrawlSpider):
name = 'player_stats'
def start_requests(self):
…
0
votes
0 answers
Scrapinghub (Scrapy Cloud) - Is there a limit to the number of spiders per project?
I am using scrapy cloud to host & run my scrapy project. All seems to work well but when 11th spider is added to the project, it doesn't show up in the scrapinghub dashboard.
I know this should be a rather simple thing to find out but it isn't…

Hamza Tasneem
- 74
- 3
- 12
0
votes
1 answer
Scrapy Error : Missing scheme in request url
I am facing issues with some urls while running scrappy
ValueError: Missing scheme in request url: mailto:?body=https%3A%2F%2Fiview.abc.net.au%2Fshow%2Finsiders
[scrapy.core.scraper:168|ERROR] Spider error processing

user1633298
- 21
- 4
0
votes
0 answers
Where to put logs in Scrapinghub? Scrapy
I am trying to put my logs in a "logs" folder but when I try to deploy to Scrapy I get No such file or directory: '/scrapinghub/sfb/logs/random_log.log' but I think I am declaring it correctly in the setup.py file. What am I doing wrong here?
File…

weston6142
- 181
- 14
0
votes
1 answer
No module found named toplevelfolder when importing github Scrapy project into Scrapinghub
When I am trying to import my Scrapy project onto Scrapinghub using the website through GitHub I get this error:
ModuleNotFoundError: No module named 'sfb'
Here is my project…

weston6142
- 181
- 14
0
votes
0 answers
How to use one crawler for multiple domains?
I'm working on a project that involves crawling multiple domains in their entirety. My scraper simply crawls the whole domain and doesn't check specific parts of the html, it just gets all the html.
For some domains, I would only want to crawl one…

Jake 1986
- 582
- 1
- 6
- 25
0
votes
0 answers
Failed to Get Scrapinghub Notification on Slack. Facing Error
I want to get Scrapinghub Spiders All (e.g Spider Run,Completion,Error) Notification on Slack.
I have created Monitor.py file and Action.py file and also added spidermoon slack settings in my settings.py file.
SPIDERMON_SLACK_SENDER_TOKEN =…

Hannan Zafar
- 1
- 1
0
votes
1 answer
Unable to access scrapyd interface on the server machine with public IP
I am trying to run scrapyd my ubuntu server which has a public IP using the following config file named scrapy.cfg
[settings]
default = web_crawler.settings
[deploy:default]
url = http://127.0.0.1:6800/
project = web_crawler
[scrapyd]
eggs_dir =…

Amanda
- 2,013
- 3
- 24
- 57