Situation:
I want to mirror an old website. This website is on https://example.com/website/
. The website uses absolute links to http://www.example.com/website/
.
Problem:
For whatever reason, wget cannot reach https://www.example.com
(the www. folder), the connection will just timeout - no idea why, it works fine in the browser (neither can curl btw).
Possible solutions:
- Have wget rewrite the links before following them while it's still crawling.
- Make wget work with the www. folder.
To maybe make .www work, I already tried setting the user-agent to FF: --header="Accept: text/html" --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0"
but that did not work.
So I somehow need to rewrite the links on that website while crawling.