0

I'm using Puppeteer (PuppeteerSharp actually, but the API is the same) to take a screenshot of a web page from my application.

The problem is that the page does several layout changes via JavaScript after the page has loaded, so a few seconds pass before seeing the "final" rendered version of the page.

At the moment I'm just waiting a "safe" amount of seconds before taking the screenshot, but this is obviously not a good approach, since a temporary performance slowdown on the machine can result in an incomplete rendering.

Since puppeteer uses Chromium in the background, is there a way to intercept Chromium's layouting/rendering events (like you can do in the DevTools console in Chrome)? Or, really, ANY other way to know when the page has stopped "changing" (visually I mean)

EDIT, some more info: The content is dynamic, so I don't know before hand what it will draw and how. Basically, it's a framework that draws different charts/tables/images/etc. (not open-source unfortunately). By testing with the "performance" tool in the Chrome DevTools however, I noticed that after the page has finished rendering all activity in the timeline stops, so if I could access that information it would be great. Unfortunately, the only way to do that in Puppeteer (that I can see) is using the "Tracing" feature, but that doesn't operate in real-time. Instead, it dumps the trace to file and the buffer is way too big to be of any use (the file is still 0 bytes after my page has already finished rendering, it only flushes to disk when I call "stopTracing"). What I would need is to access the Tracing feature of puppeteer in realt-time, for example via events or a in-memory stream, but that doesn't seem to be supported by the API. Any way around this?

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Master_T
  • 7,232
  • 11
  • 72
  • 144

2 Answers2

0

You should use page.waitForSelector() to wait for the dynamic elements to finish rendering.

There must be a pattern that can be identified in terms of the content being generated.

Keep in mind that you can use flexible CSS Selectors to match elements or attributes without knowing their exact values.

await page.goto( 'https://example.com/', { 'waitUntil' : 'networkidle0' } );

await Promise.all([
    page.waitForSelector( '[class^="chart-"]' ),    // Class begins with 'chart-'
    page.waitForSelector( '[name$="-image"]' ),     // Name ends with '-image'
    page.waitForSelector( 'table:nth-of-type(5)' )  // Fifth table
]);

This can be useful when waiting for a certain pattern to exist in the DOM.

If page.waitForSelector() is not powerful enough to meet your needs, you can use page.waitForXPath():

await page.waitForXPath( '//div[contains(text(), "complete")]' ); // Div contains 'complete'

Alternatively, you can plug the MutationObserver interface into page.evaluate() to watch for changes being made to the DOM tree. When the changes have stopped over a period of time, you can resume your program.

Grant Miller
  • 27,532
  • 16
  • 147
  • 165
  • Thanks for the suggestion, the MutationObserver approach seems interesting, but let me ask you this: would it detect changes made to a html5 canvas? Or only on the stdard DOM structure? – Master_T Sep 17 '18 at 14:03
0

After some trial and error, I settled for this solution:

string traceFile = IOHelper.GetTemporaryFile("txt");
long lastSize = 0;
int cyclesWithoutTraceActivity = 0;
int totalCycles = 0;
while (cyclesWithoutTraceActivity < 4 && totalCycles < 25)
{

    File.Create(traceFile).Close();
    await page.Tracing.StartAsync(new TracingOptions()
    {
        Categories = new List<string>() { "devtools.timeline" },
        Path = traceFile,
    });

    Thread.Sleep(500);                

    await page.Tracing.StopAsync();

    long curSize = new FileInfo(traceFile).Length;
    if(Math.Abs(lastSize - curSize) > 5)
    {
        logger.Debug("Trace activity detected, waiting...");
        cyclesWithoutTraceActivity = 0;
    }
    else
    {
        logger.Debug("No trace activity detected, increasing idle counter...");
        cyclesWithoutTraceActivity++;
    }
    lastSize = curSize;

    totalCycles++;
}
File.Delete(traceFile);
if(totalCycles == 25)
{
    logger.Warn($"WARNING: page did not stabilize within allotted time limit (15 seconds). Rendering page in current state, might be incomplete");
}

Basically what I do here is this: I run Chromium's tracing at 500 msec intervals, and each time I compare the size of the last trace file to the size of the current trace file. Any significant changes in the size are interpreted as activity on the timeline, and they reset the idle counter. If enough time passes without significant changes, I assume the page has finished rendering. Note that the trace file always starts with some debugging info (even if the timeline itself has no activity to report), this is the reason why I don't do an exact size comparison, but instead I check if the file's lengths are more than 5 bytes apart: since the initial debug info contains some counters and IDs that vary over time, I allow for a little variance to account for this.

Master_T
  • 7,232
  • 11
  • 72
  • 144