2

im trying this sample to obtain the number of offers a NFT has in opensea:

import { test, expect } from '@playwright/test';

test('test', async ({ page }) => {
    await page.goto('https://opensea.io/assets/ethereum/0x63217dbb73e7a02c1d30f486e899ee66d0aa5e0b/6341');
    await page.waitForLoadState('networkidle');

    let selector = page.locator("[id='Body offers-panel'] li");
    const offers = await selector.count();

    console.log('Num of offers:', offers);
});

and then I run "npx playwright tests" what always print "Num of offers: 0"

But if I run it in --headed mode, it works perfectly and outputs "Num of offers: 5"

Can anyone explain/help me to understand it?

I tried using:

let selector = page.locator("[id='Body offers-panel'] li").waitFor();

Tried to wait until all requests are done

await page.waitForLoadState('networkidle');

tried to wait for the selector:

let selector = page.locator("[id='Body offers-panel'] li").first().waitFor();

But none worked, I always have 0 count unless I run the test in --headed mode, no matter of which NFT address I try.

I would like to solve it or understand why this happen

ggorlen
  • 44,755
  • 7
  • 76
  • 106
SrConejo
  • 23
  • 3

2 Answers2

2

Some websites will not load the page if they detect a headless client. This is to prevent scraping and such. My guess is this is what's happening here

See:
Are you headless?
Detect Headless

Sami
  • 66
  • 3
1

Headless mode makes it more obvious to servers that your script is a bot. You're being detected and blocked headlessly, but bypassing detection when running headfully.

Since you can't see anything, headless is a bit harder to debug than headful. Using console.log(await page.content()) and await page.screenshot({path: "test.png"}) are good strategies for figuring out why elements you expect to be on the page aren't.

In this case, adding

const text = (await page.textContent("body"))
  .replace(/ +/g, " ")
  .replace(/(\n ?)+/g, "\n")
  .trim();
console.log(text);

after goto to get the full text content of the page gives:

Access denied
Error code 1020
You do not have access to <Your URL>.The site owner may have set restrictions that prevent you from accessing the site.
Error details
Provide the site owner this information.
I got an error when visiting <Your URL>.
Error code: 1020
Ray ID: **************
Country: US
Data center: *****
IP: *****************
Timestamp: 2023-02-17 22:39:13 UTC
Click to copy
Was this page helpful?
Yes
No
Thank you for your feedback!
Performance & security by Cloudflare

It's not a perfect guarantee, but adding a user agent header is an easy option that seems to be enough to avoid headless detection on this particular site at this point in time:

import {expect, test} from "@playwright/test"; // ^1.30.0

const url = "<Your URL>";
const userAgent =
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36";

test.describe("with user agent", () => {
  test.use({userAgent});

  test("is able to retrieve offers", async ({page}) => {
    await page.goto(url);
    const selector = page.locator('[id="Body offers-panel"] li');
    const offers = await selector.count();
    console.log("Num of offers:", offers); // => Num of offers: 11
  });
});
ggorlen
  • 44,755
  • 7
  • 76
  • 106