Questions tagged [splash-js-render]

Splash JS is a javascript rendering service. It’s a lightweight web browser with an HTTP API, implemented in Python using Twisted and QT. It's Selenium's competitor.

https://splash.readthedocs.io/en/stable/

Splash - A javascript rendering service

Splash is a javascript rendering service. It’s a lightweight web browser with an HTTP API, implemented in Python using Twisted and QT. The (twisted) QT reactor is used to make the sever fully asynchronous allowing to take advantage of webkit concurrency via QT main loop. Some of Splash features:

  • process multiple webpages in parallel;
  • get HTML results and/or take screenshots;
  • turn OFF images or use Adblock Plus rules to make rendering faster;
  • execute custom JavaScript in page context;
  • write Lua browsing scripts;
  • develop Splash Lua scripts in Splash-Jupyter Notebooks.
  • get detailed rendering info in HAR format.
138 questions
5
votes
1 answer

Scrapy does not fetch markup on response.css

I've built a simple scrapy spider running on scrapinghub: class ExtractionSpider(scrapy.Spider): name = "extraction" allowed_domains = ['domain'] start_urls = ['http://somedomainstart'] user_agent = "Mozilla/5.0 (Windows NT 10.0;…
qubits
  • 1,227
  • 3
  • 20
  • 50
5
votes
0 answers

FileNotFoundError: [Errno 2] after pushing splash to heroku

I'm trying to deploy the latest scrapinghub/splash I am using git-bash on win10. I forked the repo to https://github.com/kc1/splash/blob/master and I have been trying to follow Using docker, scrapy splash on Heroku to modify the docker file After…
user1592380
  • 34,265
  • 92
  • 284
  • 515
4
votes
2 answers

Storing responses as files using Scrapy Splash

I'm creating my first scrapy project with Splash and work with the testdata from http://quotes.toscrape.com/js/ I want to store the quotes of each page as a separate file on disk (in the code below I first try to store the entire page). I have the…
Adam
  • 6,041
  • 36
  • 120
  • 208
4
votes
2 answers

Getting a response body with scrapy splash

I'm working with scrapy 1.6 and splash 3.2 I have: import scrapy import random from scrapy_splash import SplashRequest from scrapy.utils.response import open_in_browser from scrapy.linkextractors import LinkExtractor USER_AGENT = 'Mozilla/5.0…
user1592380
  • 34,265
  • 92
  • 284
  • 515
4
votes
1 answer

Click Button in Scrapy-Splash

I am writing a scrapy-splash program and I need to click on the display button on the webpage, as seen in the image below, in order to display the data, for 10th edition, so I can scrape it. I have the code I tried below but it does not work. The…
Tim
  • 191
  • 2
  • 28
4
votes
1 answer

Scrapy with Splash doesn't wait for website to load

I am trying to render and scrape an interactive website by invoking Splash through the Python script, basically following this tutorial: import scrapy from scrapy_splash import SplashRequest class MySpider(scrapy.Spider): start_urls =…
Zed
  • 5,683
  • 11
  • 49
  • 81
4
votes
0 answers

Scrapy + Splash returns a lot of 504 Time Out errors

I have followed Splash's FAQ for production setups and my system currently looks like this: 1 Scrapy Container with 6 concurrency requests. 1 HAProxy Container that load balance to splash containers 2 Splash Containers with 3 slots each. I use…
Marcus Lind
  • 10,374
  • 7
  • 58
  • 112
4
votes
2 answers

Scrapy Splash click button doesn't work

What I'm trying to do On avito.ru (Russian real estate site), person's phone is hidden until you click on it. I want to collect the phone using Scrapy+Splash. Example URL:…
alexanderlukanin13
  • 4,577
  • 26
  • 29
4
votes
0 answers

Having content security policy issue with scrapy and splash

What I am doing is Google for some linkedin specific links Login to linkedin.com (successful) Revisit home page (it fails here) Extract some the the desired info from links I googled in first step My scrapy bot fails at step 3. So my questions…
sakhunzai
  • 13,900
  • 23
  • 98
  • 159
4
votes
1 answer

How to get cookie generated from a Scrapy Splash request?

So I have made a Scrapy Splash request like this: def start_requests(self): lua_script = ''' function main(splash) local url = splash.args.url assert(splash:go(url)) assert(splash:wait(0.5)) return { cookies =…
Aminah Nuraini
  • 18,120
  • 8
  • 90
  • 108
4
votes
1 answer

Scrapy Splash is always returning the same page

For each of several Disqus users, whose profile urls are known in advance, I want to scrape their names and usernames of their followers. I'm using scrapy and splash do to so. However, when I'm parsing the responses, it seems that it is always…
Milos
  • 518
  • 7
  • 22
4
votes
2 answers

Execute inline JavaScript in Scrapy response

I am trying to log into a website with Scrapy, but the response received is an HTML document containing only inline JavaScript. The JS redirects to the page I want to scrape data from. But Scrapy does not execute the JS and therefore doesn't route…
Craig
  • 548
  • 9
  • 24
4
votes
3 answers

How to scrape AJAX based websites by using Scrapy and Splash?

I want to make a general scraper which can crawl and scrape all data from any type of website including AJAX websites. I have extensively searched the internet but could not find any proper link which can explain me how Scrapy and Splash together…
Rohan
  • 41
  • 1
  • 5
4
votes
1 answer

Proxy servers with Scrapy-Splash

I am trying to get proxy servers to work on my local splash instance. I have read several documents, but have not found any workable examples. It was brought to my attention that this https://github.com/scrapy-plugins/scrapy-splash/issues/107 was…
eusid
  • 769
  • 2
  • 6
  • 18
4
votes
1 answer

Scrapy-Splash with Tor

I have succeed to run Scrapy with Tor using this link: http://pkmishra.github.io/blog/2013/03/18/how-to-run-scrapy-with-TOR-and-multiple-browser-agents-part-1-mac/ But i couldn't run Splash with Tor. In Scrapy-settings.py I directed to polipo for…
1
2
3
9 10