I got a problem with wget
, I need to download an entire site with images and other files linked in main pages, I'm using these options:
wget --load-cookies /tmp/cookie.txt -r -l 1 -k -p -nc 'https://www.example.com/mainpage.do'
(-l
1 is used for testing, I may need to travel to level 3 or even 4)
The problem is: I can't figure out how to bypass the 'random' GET parameter that is added after some recursion cycles, so my final result in the /tmp
folder is like this:
/tmp/www.example.com/mainpage.do
/tmp/www.example.com/mainpage.do?cx=0.0340590343408
/tmp/www.example.com/mainpage.do?cx=0.0348934786475
/tmp/www.example.com/mainpage.do?cx=0.0032878284787
/tmp/www.example.com/mainpage.do?cx=0.0266389459023
/tmp/www.example.com/mainpage.do?cx=0.0103290334732
/tmp/www.example.com/mainpage.do?cx=0.0890345378478
Since the page it is always the same I don't need to get it other times, I tried with -nc
option but it doesn't work, I also tried using -R
(reject) but it only works with file extensions, not with URL parameters.
I looked extensively in the wget manual but I don't seem to find a way to do it; it is not mandatory to use wget, if you know how to do it in other ways, they are welcome.