Questions tagged [splash-js-render]

Splash JS is a javascript rendering service. It’s a lightweight web browser with an HTTP API, implemented in Python using Twisted and QT. It's Selenium's competitor.

https://splash.readthedocs.io/en/stable/

Splash - A javascript rendering service

Splash is a javascript rendering service. It’s a lightweight web browser with an HTTP API, implemented in Python using Twisted and QT. The (twisted) QT reactor is used to make the sever fully asynchronous allowing to take advantage of webkit concurrency via QT main loop. Some of Splash features:

  • process multiple webpages in parallel;
  • get HTML results and/or take screenshots;
  • turn OFF images or use Adblock Plus rules to make rendering faster;
  • execute custom JavaScript in page context;
  • write Lua browsing scripts;
  • develop Splash Lua scripts in Splash-Jupyter Notebooks.
  • get detailed rendering info in HAR format.
138 questions
4
votes
1 answer

Making Splash, Scrapy and Scrapoxy work together

I'm coding web scrapers using Scrapy. A few sites that I need to access require me to interact with them so I'm making requests using Splash which allows me to do so. This currently works just fine. To prevent my scrapers from getting blocked, I…
4
votes
2 answers

Read cookies from Splash request

I'm trying to access cookies after I've made a request using Splash. Below is how I've build the request. script = """ function main(splash) splash:init_cookies(splash.args.cookies) assert(splash:go{ splash.args.url, …
Casper
  • 1,435
  • 10
  • 22
4
votes
2 answers

Scrapy Splash won't execute lua script

I have ran across an issue in which my Lua script refuses to execute. The returned response from the ScrapyRequest call seems to be an HTML body, while i'm expecting a document title. I am assuming that the Lua script is never being called as it…
3
votes
2 answers

'Enter' key won't be send with splashR::splash_send_key

EDIT: Since I can't edit the bounty message : "I do not have the exerpertise in Lua to fix this problem on my own. I hope someone can help me with that." With the following code snippet I want to automatically log myself into my strava.com account…
mugdi
  • 365
  • 5
  • 17
3
votes
0 answers

XML - How to get link from element when there is no link on href

I have this Html above and I need to get the link to…
João Koritar
  • 89
  • 1
  • 7
3
votes
1 answer

Splash - Scrapy - HAR data

In general I understand how to work with Scrapy and x-path to parse the html. However, I don't know how to grab the HAR data. mport scrapy from scrapy_splash import SplashRequest class QuotesSpider(scrapy.Spider): name = 'quotes' …
Zach
  • 421
  • 1
  • 5
  • 11
3
votes
1 answer

Scrapy and Incapsula

I'm trying to use Scrapy with Splash to retrieve data from the website "whoscored.com". Here is my settings: BOT_NAME = 'scrapy_matchs' # Crawl responsibly by identifying yourself (and your website) on the user-agent #USER_AGENT = 'scrapy_matchs…
Jérémy Octeau
  • 689
  • 1
  • 10
  • 26
3
votes
0 answers

How to enable javascript in Splash

I have been recently introduced to Splash. I'm currently trying to render the webpage of the company that I work at (I prefer not to name the company) in the splash API. When I try to render the page in the Splash API, the html contains a message…
titusAdam
  • 779
  • 1
  • 16
  • 35
3
votes
0 answers

scrapy-splash crawler starts fast but slows down (not throttled by website)

I have a single crawler written in scrapy using the splash browser via the scrapy-splash python package. I am using the aquarium python package to load balance the parallel scrapy requests to a splash docker cluster. The scraper uses a long list of…
user1837332
  • 91
  • 1
  • 3
3
votes
1 answer

Use splash from scrapinghub scraping hub locally

I got a subscription for splash on scrapinghub and I want to use this from a script that is running on my local machine. The instructions I have found so far are: Edit the settings file: #I got this one from my scraping hub account SPLASH_URL =…
3
votes
2 answers

Scrapy and Splash times out for a specific site

I have an issue with Scrapy, Crawlera and Splash when trying the fetch responses from this site. I tried the following without luck: pure Scrapy shell - times out Scrapy + Crawlera - times out Scrapinghub Splash instance (small) - times…
3
votes
1 answer

how to get status code other than 200 from scrapy-splash

I am trying to get request status code with scrapy and scrapy-splash,below is spider code. class Exp10itSpider(scrapy.Spider): name = "exp10it" def start_requests(self): urls = [ 'http://192.168.8.240:8000/xxxx' …
3
votes
2 answers

How set password in scrapinghub/splash docker installation?

I'm using the splash on an ubuntu server and followed the instructions to install with docker (https://github.com/scrapy-plugins/scrapy-splash). docker run -p 8050: 8050 scrapinghub / splash How can I change the settings and set username and…
rnnhm
  • 53
  • 6
3
votes
1 answer

How can I make sure scrapy-splash had render the entire page successfully

Problem Occurred When I Was Crawled The Whole Website By Using splash To Render The Entire Target Page.Some Page Was Not Random Successfully So I Was False To Get The Information That Supports To Be There When Render Job Had Done.That Means I Just…
Brook
  • 31
  • 3
3
votes
1 answer

Get content inside of script tag

Hello everyone I'm trying to fetch content inside of script tag. http://www.teknosa.com/urunler/145051447/samsung-hm1500-bluetooth-kulaklik this is the website. Also this is script tag which I want to enter inside. $.Teknosa.ProductDetail =…
Murat Kaya
  • 1,281
  • 3
  • 28
  • 52
1 2
3
9 10