a web scraping development and services company, supplies cloud-based web crawling platforms.
Questions tagged [scrapinghub]
179 questions
2
votes
1 answer
ScrapingHub and remote database
I'm creating a spider with scrapy, and I want to use MySQL database to get start_urls in my spider. Now I would like to know if it's possible to connect scrapy-cloud to a remote database?

gueyebaba
- 51
- 4
2
votes
1 answer
Add settings in scrapinghub spider
I'm trying to enable mongodb in my spider in scrapinghub platform. For this I have to enable the extension via "EXTENSIONS" setting in the UI. But, while running the spider, I get the below error:
ValueError: Some paths in…

user3295878
- 831
- 1
- 6
- 19
2
votes
1 answer
Delete spiders from scrapinghub
I am a new user of scrapinghub.
I already searched on googled and had read the scrapinghub docs but I could not find any information about removing spiders from a project. Is it possible, how?
I do not want to replace a spider, I want to…

Inês Martins
- 530
- 2
- 10
- 23
2
votes
1 answer
Achieving Next page through javascript in scrapy python with splash?
Actually my intension is to achieve the Next from "href="javascript:submitAction_win0(document.win0,'HRS_APPL_WRK_HRS_LST_NEXT')", so Just for an example I am taking [this url][1]. From this url as you can see the Next at the end of the page, so if…
user4273328
1
vote
1 answer
Logitech Gaming Software LUA script not working on GHub
I’ve always used this LUA script on Logitech Gaming software by using a G502 mouse, I had to change my old mouse and I bought a new version “G502x” which is not recognized by LGS so I had to install GHub to make the mouse useful, but the script is…

Loriner De Syrtis
- 11
- 1
1
vote
1 answer
Webscraping yml files from Github
I am trying to scrape certain open source file from GitHub but I'm having an issue with their new format.
This if an example link: https://github.com/xavierLowmiller/xcodegen-action/blob/main/action.yml that leads to a YML file. I am trying to…

Artemis
- 15
- 3
1
vote
0 answers
Web scraping using Octoparse
I have been trying to use Octoparse to scrape data from a particular webpage.
It has a total of 361 pages and 10 data rows on each page (total of 3610 data points). However, what I get is only 3260 data points.
Normally the process works fine and…

Anthony Nguyen
- 27
- 5
1
vote
1 answer
Crawlera/Zyte proxy authentication using C# and Selenium
I've tried a number of ways of using Zyte (formally Crawerla) proxies with Selenium.
They provide
1- API key (username)
2- Proxy url/port.
No password is needed.
What I have tried...
ChromeOptions options = new ChromeOptions();
var proxy =…

MattHodson
- 736
- 7
- 22
1
vote
1 answer
Not able to scrape image URLs using beautiful soup and python
So basically I am using the below code to scrape the image urls of the credit cards from the respective links in the explore_more_url variable.
from urllib.request import urlopen
from bs4 import BeautifulSoup
import json, requests, re
from selenium…
user15215612
1
vote
1 answer
How can I scrape the image using Beautiful Soup and python
I am trying to scrape the image link from the below link but I am not able to
Link : https://www.online.citibank.co.in/credit-card/rewards/citi-rewards-credit-card?eOfferCode=INCCCCTWAFCTRELM
I have used the below code
x = '…

Ali Baba
- 85
- 11
1
vote
2 answers
Trying to scrape image urls but not able to get it using beautiful soup and python
I am scraping this link :…

Ali Baba
- 85
- 11
1
vote
0 answers
I am using scrapy to scrape data from Yelp. I cannot see any error but data is not getting scraped from the StartURLs mentioned in the spider
Code for the items.py and other files are mentioned below. The logs are also mentioned at the end.I am not getting any error but according to the logs the scrapy has not scraped any pages.
```
import scrapy
class YelpItem(scrapy.Item):
#…

sneha s
- 11
- 1
1
vote
1 answer
How to iterate through a list of Beautful soup tag elements and get a particular text if found else an empty string?
Case1:
Derattizzazione Disinfestazione Punteruolo Rosso - Quark Srl

dashkandhar
- 83
- 1
- 7
1
vote
0 answers
504 Timeout Exception when using scrapy-splash with crawlera
I tried scrapy-splash with http://www.google.com and followed all the prerequisite steps given in the following Github Repo https://github.com/scrapy-plugins/scrapy-splash and i was able to render the Google page.
However when i tired the same…
1
vote
1 answer
ScrapingHub Deploy Fails
I am trying to deploy to ScrapingHub and here is the error I am getting...
Deploy log last 30 lines:
File "/app/python/lib/python3.8/site-packages/scrapy/cmdline.py", line 142, in execute
cmd.crawler_process = CrawlerProcess(settings)
File…

johncsmith427
- 83
- 8