a web scraping development and services company, supplies cloud-based web crawling platforms.
Questions tagged [scrapinghub]
179 questions
1
vote
1 answer
scrapy hub - exceptions.ImportError: No module named pymodm
I can run my scrapy locally without any issues, however, when i try to run job from scrapinghub i get the following error (connecting to mongo atlas cloud):
exceptions.ImportError: No module named pymodm
I import using:
import pymodm
Any help is…

Rodrigo Rubio
- 1,686
- 2
- 16
- 26
1
vote
1 answer
Requirements error while trying to deploy to Scrapy Cloud
I'm trying to deploy my spider to Scrapy Cloud using shub but I keep running into this following error:
$ shub deploy
Packing version 2df64a0-master
Deploying to Scrapy Cloud project "164526"
Deploy log last 30 lines:
---> Using cache
--->…

Simon
- 322
- 1
- 13
1
vote
0 answers
Portia spider not crawling items
I have created a spider using Portia UI and I have deployed and scheduled in one of my virtual machine using scrapyd. Spider ran fine and scraped website contents.
But when I try to deploy and schedule the same spider in another similar virtual…

Prabhakar
- 1,138
- 2
- 14
- 30
1
vote
1 answer
Can't deploy to Scrapinghub
When I try to deploy using shub deploy, I got this error:
Removing intermediate container fccf1ec715e6 Step 10 : RUN sudo -u
nobody -E PYTHONUSERBASE=$PYTHONUSERBASE pip install --user
--no-cache-dir -r /app/requirements.txt ---> Running in…

Aminah Nuraini
- 18,120
- 8
- 90
- 108
1
vote
1 answer
How to configure IP address form France in Crawlera?
I use Crawlera in my Scrapy-Selenium Crawler.
but I need to use just the IP from France.
how can configure my crawlera to do this.
custom_settings = {
'DOWNLOADER_MIDDLEWARES' : {'scrapy_crawlera.CrawleraMiddleware': 600},
…

parik
- 2,313
- 12
- 39
- 67
1
vote
0 answers
Is it possible to support JS by Portia by using splash?
Is it possible to support js by portia using splash download middlware middlware in slybot?
I am trying to connect splash via docker with portia. how to import the download splash middleware in to slybot path…

Suresh
- 11
- 2
1
vote
1 answer
How to render the javascript page in portia?
I am using the portia for rendering the JavaScript page using scrapinghub/splash middleware. but its seem following error during loading job page in portia.
Error:
Your web browser must have JavaScript enabledin order for this
application to…
user4443904
0
votes
1 answer
I'm having issue while deploying scrapper to Zyte formerly (Scraping hub)
My spider has to read some data from input.csv file. It runs fine locally. But when I try to deploy it on Zyte by shub deploy it does not includes input.csv in build.
So when I try to run it on the server it produces following error.
Traceback (most…

Muhammad Ahmad
- 11
- 4
0
votes
1 answer
Extract data from company sharepoint using Python
Can I extract data from company's sharepoint using python.
Used power automate but I want to use python code
0
votes
0 answers
Are there any tools or 3rd parties - free or paid that can scrape URLs for Price?
I have a list of URLs and need auto update on the price found on website. Are there any tools or 3rd parties - free or paid that can scrape URLs for Price?
I have a list of URLs and need auto update on the price found on website. Are there any tools…
0
votes
0 answers
splash won't render certain websites
I just run Aquarium (splash 3.0) and it works for google, and a lot of other websites.
I'm trying to render certain websites, for example:
https://www.arseus-medical.be/be-nl
For this one, it doesn't render at all.
I send the request to the Splash…
0
votes
1 answer
How to save Scrapy Broad Crawl Results?
Scrapy has a built-in way of persisting results in AWS S3 using the FEEDS setting.
but for a broad crawl over different domains this would create a single file, where the results from all domains are saved.
how could I save the results of each…

NightOwl
- 1,069
- 3
- 13
- 23
0
votes
1 answer
Selenium Problem extracting Google business description
I seem to be struggling with this issue for a couple of days and could really use some help. I am trying to scrape Google busineses information with Python beautifulsoups and Selenium and I want to extract the business description that is available…

Thresh Bot
- 41
- 5
0
votes
1 answer
Why error with installing csv when its part of python core package in scrapinghub
I have 3 spiders defined.
All the related requirements are mentioned in requirements.txt
scrapy
pandas
pytest
requests
google-auth
functions-framework
shub
msgpack-python
Also, the scrapinghub.yml defined to use scrapy 2.5
project:…

Avirup Das
- 189
- 1
- 3
- 15
0
votes
1 answer
YouTube Subscriptions List Scraping
I want to scrap my YouTube subscriptions list into one csv file. I typed this code (but I didn't finish coding yet):
import requests
from bs4 import BeautifulSoup
import csv
url = 'https://www.youtube.com/feed/channels'
source =…

Mohamed Hendy
- 13
- 2