I'm encountering a rather special case when I try to download a website and attempt to convert all links within a certain domain to relative links. The wget command I use is:
wget -q -r -nH -H -D{domain-name} -l 5 -p -E -k -e robots=off {url}
It downloads all the required pages and resources just fine. It then proceeds to convert all the links in the source pages with relative paths. During this process, it even performs some path encoding so that the website works smoothly from the download folder.
The issue that I'm facing is:
Say I have an link to a stylesheet in the original index.html like so -
<link rel="stylesheet" href="/templates/source/booga booga/foobar.css" type="text/css" />
wget downloads the correct css from the server and encodes the link to the css file in index.html like so -
<link rel="stylesheet" href="/templates/source/booga booga/foobar.css" type="text/css" />
All good so far. All browsers find the stylesheet just fine.
Now, the stylesheet foobar.css contains a section like so -
.foo-bar-button {
font-size: 12px;
padding: 10px 20px 10px 30px;
background: url(/templates/professional/1/main/en/gfx/booga%20booga/foo-bar.png) left 55% no-repeat;
display: block;
width: 90px;
}
wget downloads the background image, and proceeds to convert and/or encode the path to it. It ends up with this -
.foo-bar-button {
font-size: 12px;
padding: 10px 20px 10px 30px;
background: url(/templates/professional/1/main/en/gfx/booga booga/foo-bar.png) left 55% no-repeat;
display: block;
width: 90px;
}
The path to the background image in the downloaded/converted css file now has an unencoded whitespace. IE can find the image just fine and the site works. Chrome/Firefox/Opera cannot handle it and the button is invisible.
I have multiple cases where the whitespace in the url() in the stylesheet leads to incorrect styling.
Any help would be appreciated.