2

I'm encountering a rather special case when I try to download a website and attempt to convert all links within a certain domain to relative links. The wget command I use is:

wget -q -r -nH -H -D{domain-name} -l 5 -p -E -k -e robots=off {url}

It downloads all the required pages and resources just fine. It then proceeds to convert all the links in the source pages with relative paths. During this process, it even performs some path encoding so that the website works smoothly from the download folder.

The issue that I'm facing is:

Say I have an link to a stylesheet in the original index.html like so -

<link rel="stylesheet" href="/templates/source/booga booga/foobar.css" type="text/css" />

wget downloads the correct css from the server and encodes the link to the css file in index.html like so -

<link rel="stylesheet" href="/templates/source/booga&#32;booga/foobar.css" type="text/css" />

All good so far. All browsers find the stylesheet just fine.

Now, the stylesheet foobar.css contains a section like so -

.foo-bar-button {
font-size: 12px;
padding: 10px 20px 10px 30px;
background: url(/templates/professional/1/main/en/gfx/booga%20booga/foo-bar.png) left 55%  no-repeat;
display: block;
width: 90px;
}

wget downloads the background image, and proceeds to convert and/or encode the path to it. It ends up with this -

.foo-bar-button {
font-size: 12px;
padding: 10px 20px 10px 30px;
background: url(/templates/professional/1/main/en/gfx/booga booga/foo-bar.png) left 55%  no-repeat;
display: block;
width: 90px;
}

The path to the background image in the downloaded/converted css file now has an unencoded whitespace. IE can find the image just fine and the site works. Chrome/Firefox/Opera cannot handle it and the button is invisible.

I have multiple cases where the whitespace in the url() in the stylesheet leads to incorrect styling.

Any help would be appreciated.

2 Answers2

0
find -name *.css -exec sed -e 's/\(url([^)]*\) /\1%20/g' -i \{\} \;

:). If urls can contain up to X spaces, then repeat this X times.

sourcejedi
  • 3,051
  • 2
  • 24
  • 42
  • I'm hoping I can tweak wget options to get what I want without post-processing. Also it gets worse as I'm working in a Windows environment with wget.exe. – Ramesh Muraleedharan Nov 09 '12 at 00:51
  • You seem to be shying away from calling this a bug :p. I think you're out of luck, unless it's been fixed in a more recent version of wget. (You could fix the other problem by using Cygwin; which might also give you a more recent version). – sourcejedi Nov 10 '12 at 10:37
  • You're right of course. I will explore post processing options, both Cygwin and otherwise. – Ramesh Muraleedharan Nov 12 '12 at 18:09
0

I know this is an old question but I found it while searching for the same problem.

I propose an alternative solution: placing the url between quotes:

`sed -re "s/url\((.+)\)(.+)/url\(\'\1\'\)\2/g" file.css -i;`

should do the trick.

H725
  • 1
  • 1