3

Right now I'm using puppeteer (NodeJS library) to convert HTML into PDF documents. While this is "working", I'm porting over to Puppeteer-Sharp (C# library).

I've got everything working, but I'm somewhat concerned with running multiple browsers concurrently.

Say for example, I'm running the same code in two separate processes on the same machine:

// Magic function. Downloads Chrome to a specific directory for the process.
var browser = GetBrowser();
var page = await browser.NewPageAsync();
await page.GoToAsync("http://www.google.com");
var pdf = await page.PdfAsync();

My question:

Is there a potential concurrency issue here?

My (limited) understanding is that the library issues instructions to Chrome using websockets, and I'm not sure there's a potential that the browsers will "collide" with each other.

Essentially I'm asking if there's a potential that the PDF bytes received (via await page.PdfAsync();) will be from the "other" browser.

If it's any consolation, the browsers are download and launched from specific directories per-process, so it's technically not the "same" instance of Chrome being launched twice (but it reality it is).

  • You shouldn't have any issue, besides RAM usage. – hardkoded Aug 05 '20 at 13:24
  • Hi @KyleCrowley, did you resolve this issue? I am seeing very similar issue to yours. I am trying to crawling with multiple puppeteerSharp browsers in parallel, however, it seems there is actually only one browser running. – shaosh Dec 08 '20 at 17:39
  • @shaosh maybe my answer is helpful for you, too – Jan Dec 16 '20 at 17:05

1 Answers1

2

You don't need multiple browsers, you can use one browser with multiple Tabs (or Pages as Puppeteer calls them). Here's my sample code of that solves the same thing you do (convert HTML to PDF). It creates one browser instance which is handed down to four processes (could be more) which each create and remove their own Page.

public class PuppeteerSharpSample {

    public async Task CreatePdfBatch(IEnumerable<string> urlList)
    {
        await using var browser = await Puppeteer.LaunchAsync( new LaunchOptions { Headless = true, ExecutablePath ="PathToChromeOrChromium.exe"};).ConfigureAwait(false);

        await urlList.ForEachAsync(4, async url =>
            {
                await PrintPdf(url, browser).ConfigureAwait(false);
            })
            .ContinueWith(t =>
                {
                    if (t.Exception != null)
                    {
                        throw t.Exception;
                    }
                })
            .ConfigureAwait(false);
    }

    private async Task PrintPdf(Browser browser, string Url)
    {
        await using var page = await browser.NewPageAsync().ConfigureAwait(false);

        await page.GoToAsync(url).ConfigureAwait(false);

        await page.PdfAsync("pdfFileNameMustBeMadeUniqueOfCourse.pdf").ConfigureAwait(false);  
    }
}

public static class HelperClass
{
    //taken from https://scatteredcode.net/parallel-foreach-async-in-c/
    public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
    {
        async Task AwaitPartition(IEnumerator<T> partition)
        {
            using (partition)
            {
                while (partition.MoveNext())
                {
                    await body(partition.Current).ContinueWith(t =>
                        {
                            if (t.IsFaulted && t.Exception != null)
                            {
                                throw t.Exception;
                            }
                        })
                        .ConfigureAwait(false);
                }
            }
        }

        return Task.WhenAll(
            Partitioner
                .Create(source)
                .GetPartitions(dop)
                .AsParallel()
                .Select(AwaitPartition));
    }
}

As a side note: you also don't need to use GetBrowser() if Chromium (can be Chrome, Chromium or the new Edge) is installed on your machine. Then you can just point to the .exe as is shown in the above code.

Jan
  • 3,825
  • 3
  • 31
  • 51