Playwright auto-scroll to bottom of infinite-scroll page

Question

I am trying to automate the scraping of a site with "infinite scroll" with Python and Playwright.

The issue is that Playwright doesn't include, as of yet, a scroll functionnality let alone an infinite auto-scroll functionnality.

From what I found on the net and my personnal testing, I can automate an infinite or finite scroll using the page.evaluate() function and some Javascript code.

For example, this works:

for i in range(20):
    page.evaluate('var div = document.getElementsByClassName("comment-container")[0];div.scrollTop = div.scrollHeight')
    page.wait_for_timeout(500)

The problem with this approach is that it will either work by specifying a number of scrolls or by telling it to keep going forever with a while True loop.

I need to find a way to tell it to keep scrolling until the final content loads.

This is the Javascript that I am currently trying in page.evaluate():

var intervalID = setInterval(function() {
    var scrollingElement = (document.scrollingElement || document.body);
    scrollingElement.scrollTop = scrollingElement.scrollHeight;
    console.log('fail')
}, 1000);
var anotherID = setInterval(function() {
    if ((window.innerHeight + window.scrollY) >= document.body.offsetHeight) {
        clearInterval(intervalID);
    }}, 1000)

This does not work either in my firefox browser or in the Playwright firefox browser. It returns immediately and doesn't execute the code in intervals.

I would be grateful if someone could tell me how I can, using Playwright, create an auto-scroll function that will detect and stop when it reaches the bottom of a dynamically loading webpage.

alex_bits · Answer 1 · 2022-08-17T19:11:07.963

So I found a working solution.

What I did was to combine Javascript with python Playwright code.

I start the setInterval with a timer of 200ms to scroll down on the page with page.evaluate() and then I follow it up with a python loop that checks every second whether the total height of the page (scroll included) has changed. If it changes it continues to scroll and if it hasn't changed than the scroll is over.
This is what it looks like:

page.evaluate(
    """
    var intervalID = setInterval(function () {
        var scrollingElement = (document.scrollingElement || document.body);
        scrollingElement.scrollTop = scrollingElement.scrollHeight;
    }, 200);

    """
)
prev_height = None
while True:
    curr_height = page.evaluate('(window.innerHeight + window.scrollY)')
    if not prev_height:
        prev_height = curr_height
        time.sleep(1)
    elif prev_height == curr_height:
        page.evaluate('clearInterval(intervalID)')
        break
    else:
        prev_height = curr_height
        time.sleep(1)

EDIT

See the below answer using the new mouse.wheel(x, y) feature for an up to date way to scroll using playwright. Combine my answer with his to lessen the need to use JS.

Nice! This can also be adapted to slowly scroll the page to the end with e.g. `scrollTop += 200` instead of just assigning `scrollHeight`. — AKX, Mar 22 '22 at 09:19

score 9 · Accepted Answer · edited Aug 21 '22 at 19:48

The new Playwright version has a scroll function. it's called mouse.wheel(x, y). In the below code, we'll be attempting to scroll through youtube.com which has an "infinite scroll":

from playwright.sync_api import Playwright, sync_playwright
import time


def run(playwright: Playwright) -> None:
    browser = playwright.chromium.launch(headless=False)
    context = browser.new_context()

    # Open new page
    page = context.new_page()

    page.goto('https://www.youtube.com/')

    # page.mouse.wheel(horizontally, vertically(positive is 
    # scrolling down, negative is scrolling up)
    for i in range(5): #make the range as long as needed
        page.mouse.wheel(0, 15000)
        time.sleep(2)
        i += 1
    
    time.sleep(15)
    # ---------------------
    context.close()
    browser.close()


with sync_playwright() as playwright:
    run(playwright)

This removes the need to mix JS with Python. Just for that reason it is a better answer in my opinion. Therefore I am accepting it. To make it a perfect answer to my question one would need to reindent correctly the code and point out a way to do an infinite scroll and stop when the scrolling is done. ;) — alex_bits, Aug 17 '22 at 19:06

score 8 · Answer 3 · answered Sep 27 '22 at 04:48

The other solutions were a tad bit verbose and "overkill" for me and this is what worked for me.

Here's a two liner that took me a few migraines to come around to :)

Note: you are going to have to put in your own selector. This is just an example...

    while page.locator("span",has_text="End of results").is_visible() is False:
        page.mouse.wheel(0,100)
        #page.keyboard.down(PageDown) also works

Literally just keep scrolling until some sort of unique selector is present. In this case a span tag with the string "End of results" (for the context of my use case) popped up when you scroll to the bottom.

I trust you can translate this logic for you own usage..

score 2 · Answer 4 · answered Nov 29 '22 at 19:12

2

the playwright has the page.keyboard.down('End') command, it will scroll to the end of the page.

answered Nov 29 '22 at 19:12

Poker Player

146
1
4

"End" will not scroll to the bottom of an infinite scroll page. – ViennaMike May 04 '23 at 15:06

getCritical · Answer 5 · 2023-06-01T22:56:35.657

So I was faced with a similar issue, but it was a specific element that has scroll and not the page itself, what I found out is if you click on the element in question and apply focus to it, page.mouse.wheel will scroll that specific element. (in my case my element was a tbody)

async scrollIntoView (locator : Locator) {
        let i = 0;
        while(await locator.isHidden()) {
            await this.page.locator('your locator goes here').click();
            await this.page.mouse.wheel(0, 300);
            i++;
            if (await locator.isVisible()) { return; }
            else if (i >= 5) { return; }
        }
    }

you can remove the incremental guard with i i just have it there to avoid any infinite loops.

score -1 · Answer 6 · edited Jun 27 '23 at 19:05

-1

This code works for me:

prev_page_height = page.evaluate("$(document).height()") 
print(f'Initial page height {prev_page_height}')

# scroll down
while True:
    page.evaluate("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(2)   # giving some time to load the page
            
    cur_page_height = page.evaluate("$(document).height()")
    print(f'Current page height {cur_page_height}')
            
    if cur_page_height > prev_page_height:
        prev_page_height = cur_page_height
    elif cur_page_height == prev_page_height:
        break

edited Jun 27 '23 at 19:05

Aaron Meese

1,670
3
22
32

answered Jun 21 '23 at 11:30

aust_anik

29
4

Don't think you need to resort to jQuery just to get the current document height. – jorisw Jul 04 '23 at 09:53

score -2 · Answer 7 · answered Sep 12 '22 at 22:04

-2

This topic old, but new to me. I have been using the playwright wheel scroll but for me it takes control/focus on the mouse.

So if I happen to be typing (which i usually am) and it scrolls, my beautiful words go into the void to never be seen again.

I am going to go ahead and try out the js solution posted above and see if that gets me around the mouse/focus issue.

answered Sep 12 '22 at 22:04

Cliff

9

3

I have to downvote cuz I don\` think this is an answer – Hi computer Sep 16 '22 at 13:33
1

It is additional information about the function being discussed (platwright mouse wheel), so no, I don't think you 'had' to vote it down. – Cliff Sep 19 '22 at 20:25

Playwright auto-scroll to bottom of infinite-scroll page

7 Answers7

EDIT

Linked