I would like Crawl 3 million web pages in a day. Due to variety of web nature - HTML, pdf etc. I need to use Selenium, Playwright etc. I noticed to use Selenium one has to build a custom container using Google DataFlow
- Is it a good choice to use Selenium inside ParDo Fns ? Can we use a single instance of Selenium across multiple instances ?
- Is the same applicable Playwright, should I build a custom image ?