What to use to check html links in large project, on Linux?

Question

I have directory with > 1000 .html files, and would like to check all of them for bad links - preferably using console. Any tool you can recommend for such task?

score 4 · Answer 1 · answered Mar 15 '10 at 16:04

4

you can use wget, eg

wget -r --spider  -o output.log http://somedomain.com

at the bottom of the output.log file, it will indicate whether wget has found broken links. you can parse that using awk/grep

answered Mar 15 '10 at 16:04

ghostdog74

327,991
56
259
343

An alternative **wget** command line to check for broken links can be found in [this answer](http://stackoverflow.com/a/15029100/1497596). Also note that a comment that I left on that answer provides a link to **wget for Windows**. – DavidRR Sep 16 '14 at 20:39

score 2 · Answer 2 · answered Mar 15 '10 at 10:26

2

I'd use checklink (a W3C project)

answered Mar 15 '10 at 10:26

Quentin

914,110
126
1,211
1,335

As long as you are careful to set the user agent and accept headers (to avoid bogus error codes from bot detectors) this should work. – Tim Post Mar 15 '10 at 11:41
It would look ok, but it's definitely not intended for such large projects - it doesn't have any way to just list broken links, and output for my project is *really* big. – Mar 15 '10 at 13:25

score 0 · Accepted Answer · answered Mar 15 '10 at 10:14

0

You can extract links from html files using Lynx text browser. Bash scripting around this should not be difficult.

answered Mar 15 '10 at 10:14

mouviciel

66,855
13
106
140

Lynx can do it, but it doesn't really support it. wget is much better suited for the purpose. – reinierpost Mar 15 '10 at 11:18
How do you get wget to output a list of links in a page? – Quentin Mar 15 '10 at 11:27

score 0 · Answer 4 · answered Mar 15 '10 at 15:55

0

Try the webgrep command line tools or, if you're comfortable with Perl, the HTML::TagReader module by the same author.

answered Mar 15 '10 at 15:55

gareth_bowles

20,760
5
52
82

What to use to check html links in large project, on Linux?

4 Answers4