4

I'm having a hard time navigating relative urls with puppeteer for a specific use case. Below you can see the basic setup and an pseudo example describing the problem.

Essentially I want to change the current url the browser thinks he is at.

What I already tried:

  1. Manipulating the response body by resolving all relative URLs by myself. Collides with some javascript based links.
  2. Triggering a new page.goto(response.url) if request url doesn't match response url and returning the response from the previous request. Can't seem to input custom options, so I don't know which request is a fake page.goto.

Can somebody lend me a helping hand? Thanks in advance.

Setup:

const browser = await puppeteer.launch({
    headless: false,
});

const [page] = await browser.pages();

await page.setRequestInterception(true);

page.on('request', (request) => {
    const resourceType = request.resourceType();

    if (['document', 'xhr', 'script'].includes(resourceType)) {

        // fetching takes place on an different instance and handles redirects internally
        const response = await fetch(request);

        request.respond({
             body: response.body,
             statusCode: response.statusCode,
             url: response.url // no effect
        });
    } else {
        request.abort('aborted');
    }
});

Navigation:

await page.goto('https://start.de');

// redirects to https://redirect.de
await page.click('a'); 

// relative href '/demo.html' resolves to https://start.de/demo.html instead of https://redirect.de/demo.html
await page.click('a'); 

Update 1

Solution Manipulating the browser history direction via window.location.

await page.goto('https://start.de');

// redirects to https://redirect.de internally
await page.click('a'); 

// changing current window location
await page.evaluate(() => {
    window.location.href = 'https://redirect.de';
});

// correctly resolves to https://redirect.de/demo.html instead of https://start.de/demo.html
await page.click('a');
joe.hart
  • 185
  • 1
  • 2
  • 8
  • 1
    When you say "change response URL," are you trying to redirect to a different URL, or are you simply trying to [replace the state](https://developer.mozilla.org/en-US/docs/Web/API/History_API#The_replaceState()_method) to trick the browser? Also, can you add the source of your `fetch` function? – Grant Miller Jul 30 '18 at 22:04
  • I was trying to replace the state. Unfortunately replaceState() doesn't work as for it only works for same origins. But I could change the location directly. Thank you @GrantMiller for pointing me in the right direction. – joe.hart Jul 31 '18 at 08:16

1 Answers1

2

When you match the request that you want to edit its body, just get the URL and make a call using "node-fetch" or "request" modules, when you receive the body edit it then sends it as a response to the original request.

for example:

const requestModule = require("request");
const cheerio = require("cheerio");

page.on("request", async (request) => {
  // Match the url that you want
  const isMatched = /page-12/.test(request.url());

  if (isMatched) {
    // Make a new call
    requestModule({
      url: request.url(),
      resolveWithFullResponse: true,
    })
      .then((response) => {
        const { body, headers, statusCode, statusMessage } = response;
        const contentType = headers["content-type"];

        // Edit body using cheerio module
        const $ = cheerio.load(body);
        $("a").each(function () {
          $(this).attr("href", "/fake_pathname");
        });

        // Send response
        request.respond({
          ok: statusMessage === "OK",
          status: statusCode,
          contentType,
          body: $.html(),
        });
      })
      .catch(() => request.continue());
  } else request.continue();
});
Naycho334
  • 167
  • 2
  • 11
  • I'm getting `requestModule(...).then is not a function` – The Onin Dec 22 '20 at 17:58
  • @NinoŠkopac did you installed "request" module? – Naycho334 Dec 24 '20 at 11:15
  • Of course. My use case was changing the URL for an AJAX call, which I did by directly editing `request._url` in the request event listener and not even invoking `request.continue()` - that was the only way it worked in the most recent Puppeteer version, otherwise, I kept getting "Request has already been handled" error, possibly due to 3rd party code. – The Onin Dec 24 '20 at 22:14