Questions tagged [scrapinghub]

a web scraping development and services company, supplies cloud-based web crawling platforms.

179 questions
2
votes
1 answer

ScrapingHub and remote database

I'm creating a spider with scrapy, and I want to use MySQL database to get start_urls in my spider. Now I would like to know if it's possible to connect scrapy-cloud to a remote database?
gueyebaba
  • 51
  • 4
2
votes
1 answer

Add settings in scrapinghub spider

I'm trying to enable mongodb in my spider in scrapinghub platform. For this I have to enable the extension via "EXTENSIONS" setting in the UI. But, while running the spider, I get the below error: ValueError: Some paths in…
user3295878
  • 831
  • 1
  • 6
  • 19
2
votes
1 answer

Delete spiders from scrapinghub

I am a new user of scrapinghub. I already searched on googled and had read the scrapinghub docs but I could not find any information about removing spiders from a project. Is it possible, how? I do not want to replace a spider, I want to…
Inês Martins
  • 530
  • 2
  • 10
  • 23
2
votes
1 answer

Achieving Next page through javascript in scrapy python with splash?

Actually my intension is to achieve the Next from "href="javascript:submitAction_win0(document.win0,'HRS_APPL_WRK_HRS_LST_NEXT')", so Just for an example I am taking [this url][1]. From this url as you can see the Next at the end of the page, so if…
user4273328
1
vote
1 answer

Logitech Gaming Software LUA script not working on GHub

I’ve always used this LUA script on Logitech Gaming software by using a G502 mouse, I had to change my old mouse and I bought a new version “G502x” which is not recognized by LGS so I had to install GHub to make the mouse useful, but the script is…
1
vote
1 answer

Webscraping yml files from Github

I am trying to scrape certain open source file from GitHub but I'm having an issue with their new format. This if an example link: https://github.com/xavierLowmiller/xcodegen-action/blob/main/action.yml that leads to a YML file. I am trying to…
Artemis
  • 15
  • 3
1
vote
0 answers

Web scraping using Octoparse

I have been trying to use Octoparse to scrape data from a particular webpage. It has a total of 361 pages and 10 data rows on each page (total of 3610 data points). However, what I get is only 3260 data points. Normally the process works fine and…
1
vote
1 answer

Crawlera/Zyte proxy authentication using C# and Selenium

I've tried a number of ways of using Zyte (formally Crawerla) proxies with Selenium. They provide 1- API key (username) 2- Proxy url/port. No password is needed. What I have tried... ChromeOptions options = new ChromeOptions(); var proxy =…
MattHodson
  • 736
  • 7
  • 22
1
vote
1 answer

Not able to scrape image URLs using beautiful soup and python

So basically I am using the below code to scrape the image urls of the credit cards from the respective links in the explore_more_url variable. from urllib.request import urlopen from bs4 import BeautifulSoup import json, requests, re from selenium…
user15215612
1
vote
1 answer

How can I scrape the image using Beautiful Soup and python

I am trying to scrape the image link from the below link but I am not able to Link : https://www.online.citibank.co.in/credit-card/rewards/citi-rewards-credit-card?eOfferCode=INCCCCTWAFCTRELM I have used the below code x = '…
1
vote
0 answers

I am using scrapy to scrape data from Yelp. I cannot see any error but data is not getting scraped from the StartURLs mentioned in the spider

Code for the items.py and other files are mentioned below. The logs are also mentioned at the end.I am not getting any error but according to the logs the scrapy has not scraped any pages. ``` import scrapy class YelpItem(scrapy.Item): #…
sneha s
  • 11
  • 1
1
vote
1 answer

How to iterate through a list of Beautful soup tag elements and get a particular text if found else an empty string?

Case1:
  • Derattizzazione Disinfestazione Punteruolo Rosso - Quark Srl
  • 1
    vote
    0 answers

    504 Timeout Exception when using scrapy-splash with crawlera

    I tried scrapy-splash with http://www.google.com and followed all the prerequisite steps given in the following Github Repo https://github.com/scrapy-plugins/scrapy-splash and i was able to render the Google page. However when i tired the same…
    1
    vote
    1 answer

    ScrapingHub Deploy Fails

    I am trying to deploy to ScrapingHub and here is the error I am getting... Deploy log last 30 lines: File "/app/python/lib/python3.8/site-packages/scrapy/cmdline.py", line 142, in execute cmd.crawler_process = CrawlerProcess(settings) File…
    1 2
    3
    11 12