How to find broken links on a website

Question

What techniques or tools are recommended for finding broken links on a website?

I have access to the logfiles, so could conceivably parse these looking for 404 errors, but would like something automated which will follow (or attempt to follow) all links on a site.

There's also [HTTrack](http://www.httrack.com/) which can do the job pretty well. — David d C e Freitas, May 26 '14 at 00:30
If you are interested in finding dead links, including consideration if the fragment identifier is live, then consider https://github.com/gajus/deadlink. — Gajus, Nov 02 '14 at 13:03
A better option is to ask for a survey of available software. Such a list, while it will date quickly due to turnover in software, will continue to be useful. This, if done in an even handed objective manner avoids the spam and opinion issue enough to leave a useful asnwer. — Sherwood Botsford, Feb 08 '15 at 23:46
i built this, https://lnkchk.com, i use it all the time, but then again, i am biased lol — Dan, Jul 26 '17 at 12:04
Best way is to create a small bot that runs over your entire site, and records the outcome. I did this to test my sites before deployment and it works really well. — Nick Berardi, Sep 15 '08 at 18:41
Another option would be [brokenlinkfinder.com](https://brokenlinkfinder.com) — eicksl, May 18 '20 at 03:54
If you're using WordPress, then there is a [great plugin](https://wpslimseo.com/products/slim-seo-link-manager/) that reports all links' statuses. — Anh Tran, Jul 03 '23 at 08:21

score 36 · Accepted Answer · edited Jul 09 '19 at 10:03

36

For Chrome Extension there is hexometer

See LinkChecker for Firefox.

For Mac OS there is a tool Integrity which can check URLs for broken links.

For Windows there is Xenu's Link Sleuth.

edited Jul 09 '19 at 10:03

Community

1
1

answered Sep 15 '08 at 18:40

jrudolph

8,307
4
32
50

I am behind http://checkerr.org – zupa Nov 12 '13 at 09:51
you can use this online tool to check broken links http://99webtools.com/broken-links-checker.php – Sunny Feb 06 '14 at 03:42
http://crawlmysite-tgugnani.rhcloud.com/ works great – Tushar Jul 04 '14 at 12:19
LinkChecker link is broken – Gus Sep 21 '19 at 19:08

score 31 · Answer 2 · answered Feb 22 '13 at 16:45

31

Just found a wget script that does what you are asking for.

wget --spider  -o wget.log  -e robots=off --wait 1 -r -p http://www.example.com

Credit for this goes to this page.

answered Feb 22 '13 at 16:45

wjbrown

411
4
3

2

A 32-bit version of **wget** for Windows can be found on SourceForge [here](http://gnuwin32.sourceforge.net/packages/wget.htm). *(Links for other GNU binaries for Windows can be found [here](http://gnuwin32.sourceforge.net/packages.html))*. The **man page** for **wget** can be found [here](https://www.gnu.org/software/wget/manual/wget.html). – DavidRR Sep 16 '14 at 20:29
2

The trouble with this method is that interpreting the log is not the easiest. You can grep for `404` and for `broken link`, but it's clear where the link is found. – Flimm May 01 '15 at 08:37
great one-liner! in the end, the log file was quite easy to interpret with an adequate tool (`Console.app` on macOS for instance) – meduz Oct 17 '21 at 15:28

score 11 · Answer 3 · answered Sep 15 '08 at 18:45

11

I like the W3C Link Checker.

answered Sep 15 '08 at 18:45

Paul Reiners

8,576
33
117
202

1

Me too. If you tick `Check linked documents recursively` and leave the `recursion depth` field empty, it seems to recurse infinitely on the specified domain. – mb21 May 29 '13 at 09:14

score 6 · Answer 4 · edited Jan 21 '14 at 18:17

6

See linkchecker tool:

LinkChecker is a free, GPL licensed website validator. LinkChecker checks links in web documents or full websites.

edited Jan 21 '14 at 18:17

ymln

967
5
14

answered Sep 15 '08 at 18:46

score 3 · Answer 5 · answered Sep 15 '08 at 18:42

3

Either use a tool that parses your log files and gives you a 'broken links' report (e.g. Analog or Google Webmaster Tools), or run a tool that spiders your web site and reports broken links (e.g. W3C Link Checker).

answered Sep 15 '08 at 18:42

Peter Hilton

17,211
6
50
75

Ian Mercer · Answer 6 · 2010-03-10T17:37:40.573

In a .NET application you can set IIS to pass all requests to ASP.NET and then in your global error handler you can catch and log 404 errors. This is something you'd do in addition to spidering your site to check for internal missing links. Doing this can help find broken links from OTHER sites and you can then fix them with 301 redirects to the correct page.

To help test your site internally there's also the Microsoft SEO toolkit.

Of course the best technique is to avoid the problem at compile time! In ASP.NET you can get close to this by requiring that all links be generated from static methods on each page so there's only ever one location where any given URL is generated. e.g. http://www.codeproject.com/KB/aspnet/StronglyTypedPages.aspx

If you want a complete C# crawler, there's one here:- http://blog.abodit.com/2010/03/a-simple-web-crawler-in-c-using-htmlagilitypack/

score 1 · Answer 7 · answered Jan 25 '12 at 14:54

1

Our commercial product DeepTrawl does this and can be used on both Windows / Mac.

Disclosure: I'm the lead developer behind DeepTrawl.

answered Jan 25 '12 at 14:54

Jonathan

1,327
3
15
24

score 0 · Answer 8 · answered Sep 15 '08 at 18:49

Your best bet is to knock together your own spider in your scripting language of choice, it could be done recursively along the lines of:

// Pseudo-code to recursively check for broken links
// logging all errors centrally
function check_links($page)
{
    $html = fetch_page($page);
    if(!$html)
    {
        // Log page to failures log
        ...
    }
    else
    {
        // Find all html, img, etc links on page
        $links = find_links_on_page($html);
        foreach($links as $link)
        {
            check_links($link);
        }
    }
}

Once your site has gotten a certain level of attention from Google, their webmaster tools are invaluable in showing broken links that users may come across, but this is quite reactionary - the dead links may be around for several weeks before google indexes them and logs the 404 in your webmaster panel.

Writing your own script like above will show you all possible broken links, without having to wait for google (webmaster tool) or your users (404 in access logs) to stumble across them.

I wouldn't recommend this approach at all unless you've got a LOT of free time. There are so many different ways a link can be embedded in page that it takes ages to write an accurate parser (eg javascript/AJAX, CSS, as well as the standard a href, link, script and iframe tags) plus you need to take into account any 'base' tag specified and all the different ways of doing the same thing. Writing the find_links_on_page() function would be several man days of work and its pointless given that there are so many good (free and/or open source) tools around. — NickG, Oct 16 '12 at 12:03

score 0 · Answer 9 · answered Sep 16 '08 at 01:35

There's a windows app called CheckWeb. Its no longer developed, but it works well, and the code is open (C++ I believe).

You just give it a url, and it will crawl your site (and external links if you choose), reporting any errors, image / page "weight" etc.

http://www.algonet.se/~hubbabub/how-to/checkweben.html

score 0 · Answer 10 · answered Aug 23 '11 at 12:04

0

LinkTiger seems like a very polished (though non-free) service to do this. I'm not using it, just wanted to add because it was not yet mentioned.

answered Aug 23 '11 at 12:04

akauppi

17,018
15
95
120

How to find broken links on a website

10 Answers10

Linked