I am trying to scrape some data from twitch, the problem I am facing is that the site uses infinite scroll and I am only able to get data from the first page.
I have tried to scroll by using the built in utility infiniteScroll but it scrolls after going to the result page not on the main page. This is how I have implemented this
import {
Dataset,
createPlaywrightRouter,
enqueueLinks,
playwrightUtils,
} from "crawlee";
export const router = createPlaywrightRouter();
router.addDefaultHandler(
async ({ log, page, request, infiniteScroll }) => {
log.debug(`Processing: ${request.url}`);
await page.waitForSelector('[data-a-target="preview-card-image-link"]');
await page.click("body");
await infiniteScroll();
enqueueLinks({
selector: ".ScTransformWrapper-sc-1wvuch4-1 a",
label: "detail",
});
}
);
router.addHandler("detail", async ({ request, page, log }) => {
log.debug(`Extracting data: ${request.url}`);
await page.waitForSelector('[id="live-channel-about-panel"]');
const instagram = await page
.locator('a[role="link"][href*="instagram"]')
.getAttribute("href");
const twitter = await page
.locator('a[role="link"][href*="twitter"]')
.getAttribute("href");
const discord = await page
.locator('a[role="link"][href*="discord"]')
.getAttribute("href");
const results = { instagram, twitter, discord };
log.debug(results);
});
Link I am trying to scrape: text