3

I'm trying to call Browser.NewPageAsync() in another static method, but when I call it, the method in which it was called just exits.

    partial class Program
    {
        static Browser Browser;

        static async Task StartBrowser()
        {
            Browser = await Puppeteer.LaunchAsync
               (
                   new LaunchOptions
                   {
                       Headless = true,
                       ExecutablePath = "Chromium\\chrome.exe"
                   }
               );
            Console.WriteLine("Browser launched");
        }

        static void StartScraping(int threads)
        {
            for (int i = 0; i < threads; i++)
            {
                Task.Run(async () =>
                {
                    int ThreadNumber = i;
                    Console.WriteLine("Thread #" + ThreadNumber + " started");
                    Page p = await Browser.NewPageAsync(); //exits here
                    await p.GoToAsync("https://www.google.com");
                    Console.WriteLine("Content:\n" + await p.GetContentAsync());
                });
            }
        }

        static async Task MainAsync()
        {
            await StartBrowser();
            StartScraping(1);
        }

        static void Main(string[] args)
        {
            MainAsync().GetAwaiter().GetResult();
        }
    }

For example: If I call Browser.NewPageAsync() in MainAsync(), then Browser.NewPageAsync() will be called as expected.

Steg_Brind
  • 51
  • 7
  • In line 3 you try to declare a variable with the same Name as the class, which does work untill you try to set that variable, then the Compiler doesnt know if you want to use the Browser class or Browser variable. To avoid such issues please stick to the C# naming conventions, and Name your Browser variable with and underscore and lowercase since its private static. – Prophet Lamb Sep 17 '19 at 08:29
  • No exceptions were catched after adding try() catch() to Task.Run() and renaming Browser to _Browser – Steg_Brind Sep 17 '19 at 08:38

2 Answers2

1

I found a solution: If the pages will be created in the same scope as their browser instance, then the pages will be created as expected, otherwise Task.Run() will be stuck due to NewPageAsync() method.

Bad behaviour:

Task[] Tasks = new Task[1];
Browser browser = await Puppeteer.LaunchAsync
(
    new LaunchOptions
    {
        Headless = true,
        ExecutablePath = "Chromium\\chrome.exe"
    }
);
for (int i = 0; i < Tasks.Length; i++)
{
    int ThreadNumber = i;
    Tasks[i] = Task.Run(async () =>
    {
       Page page = await browser.NewPageAsync(); //stucks
    });
}

Task.WaitAll(Tasks);

As expected:

Task[] Tasks = new Task[1];
for (int i = 0; i < Tasks.Length; i++)
{
    int ThreadNumber = i;
    Tasks[i] = Task.Run(async () =>
    {
       Browser browser = await Puppeteer.LaunchAsync
       (
           new LaunchOptions
           {
                Headless = true,
               ExecutablePath = "Chromium\\chrome.exe"
           }
       );
       Page page = await browser.NewPageAsync(); //creates as expected
    });
}

Task.WaitAll(Tasks);

Anyway this is not the best solution because I have to create browsers for async tasks rather than using one browser for all async tasks. Hope someone can explain that. Thanks everyone for helping!

Steg_Brind
  • 51
  • 7
0

You are starting tasks but not waiting until the finish. You need to wait them all:

    ...
    static void StartScraping(int threads)
    {
        Task.WaitAll(
            Enumerable.Range(0, threads)
            .Select(async ThreadNumber =>
            {
                try
                {
                    Console.WriteLine("Thread #" + ThreadNumber + " started");
                    Page p = await Browser.NewPageAsync(); //exits here
                    await p.GoToAsync("https://www.google.com");
                    Console.WriteLine("Content:\n" + await p.GetContentAsync());
                }
                catch (Exception e)
                {
                    Console.WriteLine("Thread #" + ThreadNumber + " failed. " + e);
                    throw;
                }
            }).ToArray());
    }

    static async Task MainAsync()
    {
        await StartBrowser();
        StartScraping(1);
    }

Also please check this Puppeteer issue: link. And ensure that Chromium version matches as here: link

Renat
  • 7,718
  • 2
  • 20
  • 34
  • No exceptions were catched after adding try() catch() to Task.Run() and renaming Browser to _Browser – Steg_Brind Sep 17 '19 at 08:38
  • @Steg_Brind, right, I've updated my answer with another suggestion – Renat Sep 17 '19 at 09:01
  • Got this error in `Select`: The type arguments for method 'Enumerable.Select(IEnumerable, Func)' cannot be inferred from the usage. Try specifying the type arguments explicitly. – Steg_Brind Sep 17 '19 at 09:25
  • @Steg_Brind, I forgot to make the Func to be `async`. Fixed one more time – Renat Sep 17 '19 at 09:38
  • I didn't actully use System.Linq before so now i got this error in `Enumerable.Range...` : Cannot implicitly convert type 'System.Collections.Generic.IEnumerable' to 'System.Threading.Tasks.Task' – Steg_Brind Sep 17 '19 at 09:52
  • 1
    Ok, I edited your answer Renat, now it stucks on `Page p = await Browser.NewPageAsync();` Guess it is Puppeteer Sharp's bug – Steg_Brind Sep 17 '19 at 10:11
  • Would suggestion from this issue work: https://github.com/kblok/puppeteer-sharp/issues/1150 ? Chromium version should match as here: https://github.com/GoogleChrome/puppeteer/blob/v1.18.1/docs/api.md – Renat Sep 17 '19 at 10:24
  • I don't think i have deployment issue. Because if that were the case, `await Browser.NewPageAsync();` wouldn't work under any circumstances for me. It only doens't work in threads – Steg_Brind Sep 17 '19 at 12:22