Questions tagged [splash-js-render]

Splash JS is a javascript rendering service. It’s a lightweight web browser with an HTTP API, implemented in Python using Twisted and QT. It's Selenium's competitor.

https://splash.readthedocs.io/en/stable/

Splash - A javascript rendering service

Splash is a javascript rendering service. It’s a lightweight web browser with an HTTP API, implemented in Python using Twisted and QT. The (twisted) QT reactor is used to make the sever fully asynchronous allowing to take advantage of webkit concurrency via QT main loop. Some of Splash features:

  • process multiple webpages in parallel;
  • get HTML results and/or take screenshots;
  • turn OFF images or use Adblock Plus rules to make rendering faster;
  • execute custom JavaScript in page context;
  • write Lua browsing scripts;
  • develop Splash Lua scripts in Splash-Jupyter Notebooks.
  • get detailed rendering info in HAR format.
138 questions
3
votes
1 answer

How to set cookies in Scrapy+Splash when javascript makes multiple requests?

When the javascript is loaded, it makes a another ajax request where cookies should be set in the response. However, Splash does not keep any cookies across multiple requests, is there a way to keep the cookies across all requests? Or even assign…
3
votes
1 answer

How to get popup content with splash

i'm starting to use scrapy with splash, and i was wondering if splash can handle multiple windows and popups. As an example i would like to use that lua script and try to obtain the google window's content function main(splash) …
3
votes
3 answers

Scrapy selector not working on Splash response

I'm trying to scrape some dynamic content using Scrapy. I have succesfully set up Splash to work along with it. However, the selectors of the following spider yield empty results: # -*- coding: utf-8 -*- import scrapy from scrapy.selector import…
2
votes
0 answers

Lua script in Splash - direct file download by button click

How to directly download and save a file by button click using Splash Lua? Example page and the download button: Conditions: The download URL is generated dynamically upon button click Clicking the button will open a "Save As" prompt window to…
Bill Huang
  • 4,491
  • 2
  • 13
  • 31
2
votes
1 answer

Web scraping with splashr fails with curl error after many successes

I am scraping a few dozen URLs using splashr which uses Splash in a Docker container as documented here. The code runs and completes fine when run directly from RStudio Server on my Digital Ocean Droplet. However, when it runs from a cron job it…
ixodid
  • 2,180
  • 1
  • 19
  • 46
2
votes
0 answers

Javascript Rendering Issue in Scrapy-Splash

I was exploring Scrapy+Splash and ran into issue that SplashRequest is not rendering the javascript and is giving exact same response scrapy.Request. The webpage I want to scrape is this. I want some fields from the webpage for my course project. I…
Fenil
  • 396
  • 1
  • 5
  • 16
2
votes
3 answers

Trying to fake and rotating user agents

I am trying to fake user agents as well as rotate them in Python. I found a tutorial online about how to do this with Scrapy using scrapy-useragents package. I scrape the webpage, https://www.whatsmyua.info/, in order to check my user agent to see…
Tim
  • 191
  • 2
  • 28
2
votes
2 answers

Scrapy splash spider not following links to fetch new pages

I am fetching data from a page that uses Javascript to link to new pages. I am using Scrapy + splash to fetch this data, however, for some reason, the links are not being followed. Here is the code for my spider: import scrapy from scrapy_splash…
Homunculus Reticulli
  • 65,167
  • 81
  • 216
  • 341
2
votes
0 answers

Cannot parse html response of scrapy-splash lua script

I am trying to parse the html returned from a SplashRequest execute endpoint which should return html, however when i pass it to the callback function it does not print anything (does not parse). My log shows no errors, code below import…
JSwordy
  • 169
  • 1
  • 2
  • 13
2
votes
2 answers

scrapy-splash active content selector works in shell but not with spider

I just started using scrapy-splash to retrieve the number of bookings from opentable.com. The following works fine in the shell: $ scrapy shell…
Stefan
  • 41,759
  • 13
  • 76
  • 81
2
votes
1 answer

Scrapy with Splash still giving DEBUG: Crawled (200)

I'm new to scrapy and I can't seem to figure out why I'm having this problem when I run my code. I coded this from a simple tutorial and then added Splash. Splash is up and running. This is the code: livros.py from scrapy.spiders import CrawlSpider,…
Azzine
  • 23
  • 5
2
votes
1 answer

Splash + Scrapoxy: x-cache-proxyname header is missing

I'm using following infrastructure for scraping a web site: Scrapy <--> Splash <--> Scrapoxy <--> web site I'm doing requests via Splash execute endpoint, with a Lua script like this: function main(splash) local host = "..." local port =…
alexanderlukanin13
  • 4,577
  • 26
  • 29
2
votes
0 answers

Splash: collect screenshot meta-data as items

I'm working with scrapy-splash to screenshot a web page and output a png with some meta-data. I know that scrapy logs all actions the engine executes with timestamps, etc, but having trouble figuring out how to access that information in my spider…
2
votes
2 answers

Splash issues (d-bus, QSslSocket, libpng)

I'm trying to use Splash via scrapinghub/splash Docker image and have some alerts coming after the first request (which is to /robots.txt endpoint because I'm using scrapy-splash plugin for scrapy library (with Python 3.6). [-] "172.17.0.1" - -…
2
votes
1 answer

Selecting dependent dropdown with scrapy-splash

I am trying to scrape the following website: https://www.climatempo.com.br/climatologia/558/saopaulo-sp. It has a two drop-down menu with the second depending on the first, so I choose to use scrapy and splash via scrapy-splash. I need to automate…