Questions tagged [splash-js-render]

Splash JS is a javascript rendering service. It’s a lightweight web browser with an HTTP API, implemented in Python using Twisted and QT. It's Selenium's competitor.

https://splash.readthedocs.io/en/stable/

Splash - A javascript rendering service

Splash is a javascript rendering service. It’s a lightweight web browser with an HTTP API, implemented in Python using Twisted and QT. The (twisted) QT reactor is used to make the sever fully asynchronous allowing to take advantage of webkit concurrency via QT main loop. Some of Splash features:

process multiple webpages in parallel;
get HTML results and/or take screenshots;
turn OFF images or use Adblock Plus rules to make rendering faster;
execute custom JavaScript in page context;
write Lua browsing scripts;
develop Splash Lua scripts in Splash-Jupyter Notebooks.
get detailed rendering info in HAR format.

138 questions

votes

1 answer

How to set cookies in Scrapy+Splash when javascript makes multiple requests?

When the javascript is loaded, it makes a another ajax request where cookies should be set in the response. However, Splash does not keep any cookies across multiple requests, is there a way to keep the cookies across all requests? Or even assign…

asked Nov 11 '16 at 08:23

James Samovar

votes

1 answer

How to get popup content with splash

i'm starting to use scrapy with splash, and i was wondering if splash can handle multiple windows and popups. As an example i would like to use that lua script and try to obtain the google window's content function main(splash) …

web-scraping scrapy popup scrapy-splash splash-js-render

asked Sep 06 '16 at 14:24

Arnaud PARAN

votes

3 answers

Scrapy selector not working on Splash response

I'm trying to scrape some dynamic content using Scrapy. I have succesfully set up Splash to work along with it. However, the selectors of the following spider yield empty results: # -*- coding: utf-8 -*- import scrapy from scrapy.selector import…

python web-scraping scrapy scrapy-splash splash-js-render

asked Jun 08 '16 at 11:54

Paolo Brasolin

votes

0 answers

Lua script in Splash - direct file download by button click

How to directly download and save a file by button click using Splash Lua? Example page and the download button: Conditions: The download URL is generated dynamically upon button click Clicking the button will open a "Save As" prompt window to…

lua splash-js-render

asked Dec 14 '21 at 08:13

Bill Huang

4,491
2
13
31

votes

1 answer

Web scraping with splashr fails with curl error after many successes

I am scraping a few dozen URLs using splashr which uses Splash in a Docker container as documented here. The code runs and completes fine when run directly from RStudio Server on my Digital Ocean Droplet. However, when it runs from a cron job it…

r web-scraping rvest splash-js-render

asked Jun 30 '21 at 00:47

ixodid

2,180
1
19
46

votes

0 answers

Javascript Rendering Issue in Scrapy-Splash

I was exploring Scrapy+Splash and ran into issue that SplashRequest is not rendering the javascript and is giving exact same response scrapy.Request. The webpage I want to scrape is this. I want some fields from the webpage for my course project. I…

web-scraping scrapy scrapy-splash splash-js-render

asked Dec 18 '19 at 13:11

Fenil

votes

3 answers

Trying to fake and rotating user agents

I am trying to fake user agents as well as rotate them in Python. I found a tutorial online about how to do this with Scrapy using scrapy-useragents package. I scrape the webpage, https://www.whatsmyua.info/, in order to check my user agent to see…

python scrapy user-agent scrapy-splash splash-js-render

asked May 10 '19 at 17:57

Tim

votes

2 answers

Scrapy splash spider not following links to fetch new pages

I am fetching data from a page that uses Javascript to link to new pages. I am using Scrapy + splash to fetch this data, however, for some reason, the links are not being followed. Here is the code for my spider: import scrapy from scrapy_splash…

python scrapy scrapy-splash splash-js-render

asked Feb 25 '19 at 13:48

Homunculus Reticulli

65,167
81
216
341

votes

0 answers

Cannot parse html response of scrapy-splash lua script

I am trying to parse the html returned from a SplashRequest execute endpoint which should return html, however when i pass it to the callback function it does not print anything (does not parse). My log shows no errors, code below import…

python web-scraping scrapy-splash splash-js-render

asked Oct 04 '18 at 13:39

JSwordy

votes

2 answers

scrapy-splash active content selector works in shell but not with spider

I just started using scrapy-splash to retrieve the number of bookings from opentable.com. The following works fine in the shell: $ scrapy shell…

python web-scraping scrapy scrapy-splash splash-js-render

asked Jun 16 '18 at 00:56

Stefan

41,759
13
76
81

votes

1 answer

Scrapy with Splash still giving DEBUG: Crawled (200)

I'm new to scrapy and I can't seem to figure out why I'm having this problem when I run my code. I coded this from a simple tutorial and then added Splash. Splash is up and running. This is the code: livros.py from scrapy.spiders import CrawlSpider,…

python scrapy scrapy-splash splash-js-render

asked May 15 '18 at 13:45

Azzine

votes

1 answer

Splash + Scrapoxy: x-cache-proxyname header is missing

I'm using following infrastructure for scraping a web site: Scrapy <--> Splash <--> Scrapoxy <--> web site I'm doing requests via Splash execute endpoint, with a Lua script like this: function main(splash) local host = "..." local port =…

python scrapy scrapy-splash splash-js-render

asked Apr 18 '18 at 20:00

alexanderlukanin13

4,577
26
29

votes

0 answers

Splash: collect screenshot meta-data as items

I'm working with scrapy-splash to screenshot a web page and output a png with some meta-data. I know that scrapy logs all actions the engine executes with timestamps, etc, but having trouble figuring out how to access that information in my spider…

python-3.x scrapy screen-capture scrapy-splash splash-js-render

asked Mar 29 '18 at 01:14

CLPatterson

votes

2 answers

Splash issues (d-bus, QSslSocket, libpng)

I'm trying to use Splash via scrapinghub/splash Docker image and have some alerts coming after the first request (which is to /robots.txt endpoint because I'm using scrapy-splash plugin for scrapy library (with Python 3.6). [-] "172.17.0.1" - -…

python-3.x docker dbus scrapy-splash splash-js-render

asked Jan 18 '18 at 00:18

Illia Ananich

votes

1 answer

Selecting dependent dropdown with scrapy-splash

I am trying to scrape the following website: https://www.climatempo.com.br/climatologia/558/saopaulo-sp. It has a two drop-down menu with the second depending on the first, so I choose to use scrapy and splash via scrapy-splash. I need to automate…

python web-scraping scrapy scrapy-splash splash-js-render

asked Nov 30 '17 at 13:54

Daniel Lima

Prev 1 2 3

…

9 10 Next