Questions tagged [scrapinghub]

a web scraping development and services company, supplies cloud-based web crawling platforms.

179 questions
1
vote
1 answer

scrapy hub - exceptions.ImportError: No module named pymodm

I can run my scrapy locally without any issues, however, when i try to run job from scrapinghub i get the following error (connecting to mongo atlas cloud): exceptions.ImportError: No module named pymodm I import using: import pymodm Any help is…
Rodrigo Rubio
  • 1,686
  • 2
  • 16
  • 26
1
vote
1 answer

Requirements error while trying to deploy to Scrapy Cloud

I'm trying to deploy my spider to Scrapy Cloud using shub but I keep running into this following error: $ shub deploy Packing version 2df64a0-master Deploying to Scrapy Cloud project "164526" Deploy log last 30 lines: ---> Using cache --->…
Simon
  • 322
  • 1
  • 13
1
vote
0 answers

Portia spider not crawling items

I have created a spider using Portia UI and I have deployed and scheduled in one of my virtual machine using scrapyd. Spider ran fine and scraped website contents. But when I try to deploy and schedule the same spider in another similar virtual…
Prabhakar
  • 1,138
  • 2
  • 14
  • 30
1
vote
1 answer

Can't deploy to Scrapinghub

When I try to deploy using shub deploy, I got this error: Removing intermediate container fccf1ec715e6 Step 10 : RUN sudo -u nobody -E PYTHONUSERBASE=$PYTHONUSERBASE pip install --user --no-cache-dir -r /app/requirements.txt ---> Running in…
Aminah Nuraini
  • 18,120
  • 8
  • 90
  • 108
1
vote
1 answer

How to configure IP address form France in Crawlera?

I use Crawlera in my Scrapy-Selenium Crawler. but I need to use just the IP from France. how can configure my crawlera to do this. custom_settings = { 'DOWNLOADER_MIDDLEWARES' : {'scrapy_crawlera.CrawleraMiddleware': 600}, …
parik
  • 2,313
  • 12
  • 39
  • 67
1
vote
0 answers

Is it possible to support JS by Portia by using splash?

Is it possible to support js by portia using splash download middlware middlware in slybot? I am trying to connect splash via docker with portia. how to import the download splash middleware in to slybot path…
Suresh
  • 11
  • 2
1
vote
1 answer

How to render the javascript page in portia?

I am using the portia for rendering the JavaScript page using scrapinghub/splash middleware. but its seem following error during loading job page in portia. Error: Your web browser must have JavaScript enabledin order for this application to…
user4443904
0
votes
1 answer

I'm having issue while deploying scrapper to Zyte formerly (Scraping hub)

My spider has to read some data from input.csv file. It runs fine locally. But when I try to deploy it on Zyte by shub deploy it does not includes input.csv in build. So when I try to run it on the server it produces following error. Traceback (most…
0
votes
1 answer

Extract data from company sharepoint using Python

Can I extract data from company's sharepoint using python. Used power automate but I want to use python code
0
votes
0 answers

Are there any tools or 3rd parties - free or paid that can scrape URLs for Price?

I have a list of URLs and need auto update on the price found on website. Are there any tools or 3rd parties - free or paid that can scrape URLs for Price? I have a list of URLs and need auto update on the price found on website. Are there any tools…
0
votes
0 answers

splash won't render certain websites

I just run Aquarium (splash 3.0) and it works for google, and a lot of other websites. I'm trying to render certain websites, for example: https://www.arseus-medical.be/be-nl For this one, it doesn't render at all. I send the request to the Splash…
0
votes
1 answer

How to save Scrapy Broad Crawl Results?

Scrapy has a built-in way of persisting results in AWS S3 using the FEEDS setting. but for a broad crawl over different domains this would create a single file, where the results from all domains are saved. how could I save the results of each…
NightOwl
  • 1,069
  • 3
  • 13
  • 23
0
votes
1 answer

Selenium Problem extracting Google business description

I seem to be struggling with this issue for a couple of days and could really use some help. I am trying to scrape Google busineses information with Python beautifulsoups and Selenium and I want to extract the business description that is available…
0
votes
1 answer

Why error with installing csv when its part of python core package in scrapinghub

I have 3 spiders defined. All the related requirements are mentioned in requirements.txt scrapy pandas pytest requests google-auth functions-framework shub msgpack-python Also, the scrapinghub.yml defined to use scrapy 2.5 project:…
Avirup Das
  • 189
  • 1
  • 3
  • 15
0
votes
1 answer

YouTube Subscriptions List Scraping

I want to scrap my YouTube subscriptions list into one csv file. I typed this code (but I didn't finish coding yet): import requests from bs4 import BeautifulSoup import csv url = 'https://www.youtube.com/feed/channels' source =…