I have directory with > 1000 .html files, and would like to check all of them for bad links - preferably using console. Any tool you can recommend for such task?
Asked
Active
Viewed 669 times
4 Answers
4
you can use wget
, eg
wget -r --spider -o output.log http://somedomain.com
at the bottom of the output.log file, it will indicate whether wget
has found broken links. you can parse that using awk/grep

ghostdog74
- 327,991
- 56
- 259
- 343
-
An alternative **wget** command line to check for broken links can be found in [this answer](http://stackoverflow.com/a/15029100/1497596). Also note that a comment that I left on that answer provides a link to **wget for Windows**. – DavidRR Sep 16 '14 at 20:39
2
-
As long as you are careful to set the user agent and accept headers (to avoid bogus error codes from bot detectors) this should work. – Tim Post Mar 15 '10 at 11:41
-
It would look ok, but it's definitely not intended for such large projects - it doesn't have any way to just list broken links, and output for my project is *really* big. – Mar 15 '10 at 13:25
0
You can extract links from html files using Lynx text browser. Bash scripting around this should not be difficult.

mouviciel
- 66,855
- 13
- 106
- 140
-
Lynx can do it, but it doesn't really support it. wget is much better suited for the purpose. – reinierpost Mar 15 '10 at 11:18
-
0
Try the webgrep command line tools or, if you're comfortable with Perl, the HTML::TagReader module by the same author.

gareth_bowles
- 20,760
- 5
- 52
- 82