I need to mirror a website and deploy the copy under a different domain name. The mirroring procedure should be all automatic, so that I can update the copy on a regular basis with cron
.
The mirror MUST NOT be a real mirror, but it MUST be static copy, e.g. a snaphot of the site at a specific time, so I think wget
might fit.
As of now, I've come up with the following script to get a copy of the original site:
#!/bin/bash
DOMAIN="example.com"
cd /srv/mirrors
TMPDIR=$(mktemp -p . -d)
cd "${TMPDIR}"
wget -m -p -E --tries=10 --convert-links --retry-connrefused "${DOMAIN}"
cd ..
rm -rf oldcopy
mv "${DOMAIN}" oldcopy
mv "${TMPDIR}/${DOMAIN}" "${DOMAIN}"
rmdir "${TMPDIR}"
The resulting copy is then brought to you by Nginx under the new domain name, with a simple configuration for a local static site, and it seems to work.
Problem is the origin server produces web pages with absolute links in them, even when the links point to internal resources. E.g. a page at https://example.com/page1
contains
<link rel="stylesheet" href="https://example.com/style.css">
<script src="https://example.com/ui.js"/>
and so on (it's WordPress). No way I can change that behavior. wget
then does not convert those links for local browsing, because they are absolute (or, at least, I think that's the cause).
EDIT: the real domain name is assodigitale.it, though I need a script that works regardless of the particular domain, because I will need it for a few other domains too.
Can I make wget
convert those links to the new domain name?