45

What techniques or tools are recommended for finding broken links on a website?

I have access to the logfiles, so could conceivably parse these looking for 404 errors, but would like something automated which will follow (or attempt to follow) all links on a site.

Ian Nelson
  • 57,123
  • 20
  • 76
  • 103
  • 1
    There's also [HTTrack](http://www.httrack.com/) which can do the job pretty well. – David d C e Freitas May 26 '14 at 00:30
  • If you are interested in finding dead links, including consideration if the fragment identifier is live, then consider https://github.com/gajus/deadlink. – Gajus Nov 02 '14 at 13:03
  • A better option is to ask for a survey of available software. Such a list, while it will date quickly due to turnover in software, will continue to be useful. This, if done in an even handed objective manner avoids the spam and opinion issue enough to leave a useful asnwer. – Sherwood Botsford Feb 08 '15 at 23:46
  • i built this, https://lnkchk.com, i use it all the time, but then again, i am biased lol – Dan Jul 26 '17 at 12:04
  • Best way is to create a small bot that runs over your entire site, and records the outcome. I did this to test my sites before deployment and it works really well. – Nick Berardi Sep 15 '08 at 18:41
  • Another option would be [brokenlinkfinder.com](https://brokenlinkfinder.com) – eicksl May 18 '20 at 03:54
  • If you're using WordPress, then there is a [great plugin](https://wpslimseo.com/products/slim-seo-link-manager/) that reports all links' statuses. – Anh Tran Jul 03 '23 at 08:21

10 Answers10

36

For Chrome Extension there is hexometer

See LinkChecker for Firefox.

For Mac OS there is a tool Integrity which can check URLs for broken links.

For Windows there is Xenu's Link Sleuth.

Community
  • 1
  • 1
jrudolph
  • 8,307
  • 4
  • 32
  • 50
31

Just found a wget script that does what you are asking for.

wget --spider  -o wget.log  -e robots=off --wait 1 -r -p http://www.example.com

Credit for this goes to this page.

wjbrown
  • 411
  • 4
  • 3
  • 2
    A 32-bit version of **wget** for Windows can be found on SourceForge [here](http://gnuwin32.sourceforge.net/packages/wget.htm). *(Links for other GNU binaries for Windows can be found [here](http://gnuwin32.sourceforge.net/packages.html))*. The **man page** for **wget** can be found [here](https://www.gnu.org/software/wget/manual/wget.html). – DavidRR Sep 16 '14 at 20:29
  • 2
    The trouble with this method is that interpreting the log is not the easiest. You can grep for `404` and for `broken link`, but it's clear where the link is found. – Flimm May 01 '15 at 08:37
  • great one-liner! in the end, the log file was quite easy to interpret with an adequate tool (`Console.app` on macOS for instance) – meduz Oct 17 '21 at 15:28
11

I like the W3C Link Checker.

Paul Reiners
  • 8,576
  • 33
  • 117
  • 202
  • 1
    Me too. If you tick `Check linked documents recursively` and leave the `recursion depth` field empty, it seems to recurse infinitely on the specified domain. – mb21 May 29 '13 at 09:14
6

See linkchecker tool:

LinkChecker is a free, GPL licensed website validator. LinkChecker checks links in web documents or full websites.

ymln
  • 967
  • 5
  • 14
3

Either use a tool that parses your log files and gives you a 'broken links' report (e.g. Analog or Google Webmaster Tools), or run a tool that spiders your web site and reports broken links (e.g. W3C Link Checker).

Peter Hilton
  • 17,211
  • 6
  • 50
  • 75
1

In a .NET application you can set IIS to pass all requests to ASP.NET and then in your global error handler you can catch and log 404 errors. This is something you'd do in addition to spidering your site to check for internal missing links. Doing this can help find broken links from OTHER sites and you can then fix them with 301 redirects to the correct page.

To help test your site internally there's also the Microsoft SEO toolkit.

Of course the best technique is to avoid the problem at compile time! In ASP.NET you can get close to this by requiring that all links be generated from static methods on each page so there's only ever one location where any given URL is generated. e.g. http://www.codeproject.com/KB/aspnet/StronglyTypedPages.aspx

If you want a complete C# crawler, there's one here:- http://blog.abodit.com/2010/03/a-simple-web-crawler-in-c-using-htmlagilitypack/

Ian Mercer
  • 38,490
  • 8
  • 97
  • 133
1

Our commercial product DeepTrawl does this and can be used on both Windows / Mac.

Disclosure: I'm the lead developer behind DeepTrawl.

Jonathan
  • 1,327
  • 3
  • 15
  • 24
0

Your best bet is to knock together your own spider in your scripting language of choice, it could be done recursively along the lines of:

// Pseudo-code to recursively check for broken links
// logging all errors centrally
function check_links($page)
{
    $html = fetch_page($page);
    if(!$html)
    {
        // Log page to failures log
        ...
    }
    else
    {
        // Find all html, img, etc links on page
        $links = find_links_on_page($html);
        foreach($links as $link)
        {
            check_links($link);
        }
    }
}

Once your site has gotten a certain level of attention from Google, their webmaster tools are invaluable in showing broken links that users may come across, but this is quite reactionary - the dead links may be around for several weeks before google indexes them and logs the 404 in your webmaster panel.

Writing your own script like above will show you all possible broken links, without having to wait for google (webmaster tool) or your users (404 in access logs) to stumble across them.

ConroyP
  • 40,958
  • 16
  • 80
  • 86
  • 3
    I wouldn't recommend this approach at all unless you've got a LOT of free time. There are so many different ways a link can be embedded in page that it takes ages to write an accurate parser (eg javascript/AJAX, CSS, as well as the standard a href, link, script and iframe tags) plus you need to take into account any 'base' tag specified and all the different ways of doing the same thing. Writing the find_links_on_page() function would be several man days of work and its pointless given that there are so many good (free and/or open source) tools around. – NickG Oct 16 '12 at 12:03
0

There's a windows app called CheckWeb. Its no longer developed, but it works well, and the code is open (C++ I believe).

You just give it a url, and it will crawl your site (and external links if you choose), reporting any errors, image / page "weight" etc.

http://www.algonet.se/~hubbabub/how-to/checkweben.html

scunliffe
  • 62,582
  • 25
  • 126
  • 161
0

LinkTiger seems like a very polished (though non-free) service to do this. I'm not using it, just wanted to add because it was not yet mentioned.

akauppi
  • 17,018
  • 15
  • 95
  • 120