I am trying to automate the scraping of a site with "infinite scroll" with Python and Playwright.
The issue is that Playwright doesn't include, as of yet, a scroll functionnality let alone an infinite auto-scroll functionnality.
From what I found on the net and my personnal testing, I can automate an infinite or finite scroll using the page.evaluate()
function and some Javascript code.
For example, this works:
for i in range(20):
page.evaluate('var div = document.getElementsByClassName("comment-container")[0];div.scrollTop = div.scrollHeight')
page.wait_for_timeout(500)
The problem with this approach is that it will either work by specifying a number of scrolls or by telling it to keep going forever with a while True
loop.
I need to find a way to tell it to keep scrolling until the final content loads.
This is the Javascript that I am currently trying in page.evaluate()
:
var intervalID = setInterval(function() {
var scrollingElement = (document.scrollingElement || document.body);
scrollingElement.scrollTop = scrollingElement.scrollHeight;
console.log('fail')
}, 1000);
var anotherID = setInterval(function() {
if ((window.innerHeight + window.scrollY) >= document.body.offsetHeight) {
clearInterval(intervalID);
}}, 1000)
This does not work either in my firefox browser or in the Playwright firefox browser. It returns immediately and doesn't execute the code in intervals.
I would be grateful if someone could tell me how I can, using Playwright, create an auto-scroll function that will detect and stop when it reaches the bottom of a dynamically loading webpage.