0

I'm using crawlee with PlaywrightCrawler. I'm getting a new url to crawl after clicking a few elements in the starting page. The way that i'm clicking those elements is using page.getByRole().click(), which codegen playwright used it:

import { chromium } from "@playwright/test";
const browser = await chromium.launch({
    headless: true,
  });
  const context = await browser.newContext();
  const page = await context.newPage();

for (let i = 0; i < brandSection.length; i++) {
    let [brandName, brandCount] = brandSection[i].split("\n");
    await page
      .getByRole("button", { name: `${brandName} ${brandCount}` })
      .click();
  }

So this works without crawlee, but when I try to use it inside a PlaywrightCrawler, It fails saying that the page instance doesn't have a method called .getByRole().

import { createPlaywrightRouter, enqueueLinks, Dataset } from "crawlee";
import { PlaywrightCrawler } from "crawlee";
....
....
router.addDefaultHandler(async ({ page, request, enqueueLinks }) => {
  const prodGridSel = ".catalog-grid a";
//**here goes the code that uses .getByRole()**//
  await enqueueLinks({
    ...
    },
  });
...
...
const crawler = new PlaywrightCrawler({
  requestHandler: router,
});

I haven't used playwright for testing, only for crawling with crawlee, so I'm guessing the getby*() functions are available when using "@playwright/test". I didn't found any information except this, which is related to cypress and probably a faulty import.

So, can I have a page instance inside crawlee that has these functions?

matrs
  • 59
  • 1
  • 6

0 Answers0