3

Using Analog 6 for web stats, and I'm surprised to see over a million 404's over 54 days. Am I looking at this correctly? Is this an unusual ratio of 404's to "200 OK" page views? I don't see any 404's in the list of actual URLs; where would a list of the broken URLs be? The site is a combination of html, WordPress and asp pages on unix/apache, if that matters.

Requests       Status Codes
 6548392       200 OK
     807       206 Partial content
 1830136       301 Document moved permanently
   61795       302 Document found elsewhere
 3091342       304 Not modified since last retrieval
    3042       400 Bad request
   49012       403 Access forbidden
 1043694       404 Document not found
    2936       500 Internal server error
     411       503 Service temporarily unavailable

General stats:

Successful requests:                   9,640,541 
Average successful requests per day:     183,490 
Successful requests for pages:         1,620,543
Failed requests:                       1,099,095 (20,066) 
masegaloeh
  • 18,236
  • 10
  • 57
  • 106
markratledge
  • 519
  • 5
  • 13
  • 26

3 Answers3

5

The list of broken URLs would be in the actual log files. Right now it appears like ~15% of the requests to your system are 404. That does seem unusually high.

If I was to guess I would bet that your page template included a link to a broken image, javascript, or css file.

A quick grep of the log files will probably reveal most of the details.

Zoredache
  • 130,897
  • 41
  • 276
  • 420
  • Starting to make sense. The top failed referrer is actually a style sheet that does exist. But the url that shows as failed is the IP of the server, not the domain, if that might matter. I can open the text file of the html server logs; what would I grep for? Thanks.... – markratledge Jul 28 '10 at 22:27
  • +1 - it'll almost certainly be a broken link embededd in the HTML or CSS. It could also be a missing `favicon.ico` or `robots.txt` – Mark Henderson Jul 28 '10 at 22:35
  • Robots.txt and favicon are fine, and all style sheet links are find.... – markratledge Jul 29 '10 at 02:58
3

I agree that is a rather high amount of 404s, but it might be automated bots trying to exploit known holes in software.

Granted it's not quite the same, but I have tens of thousands of 404's a month on our web server, and analysing the URL's it just looks like some bot trying known SQL injections to hundreds of different products (none of which we have installed).

It's a mammoth initial task, but exclude the exploit URLs from your preferred way of finding genuine 404's and it gets much more accurate.

Ben Pilbrow
  • 12,041
  • 5
  • 36
  • 57
2

If you can't get access to the raw logs as already suggested, consider running a crawl over your site to find broken links. See W3C's Link Checker, specifying Check linked documents recursively, recursion depth as make sense.

medina
  • 1,970
  • 10
  • 7
  • Ran that checker, and it took half a hour, but no bad links. – markratledge Jul 29 '10 at 03:02
  • That gives credence to some of the other theories of hits on externally-crafted URLs which are not defined on your site. Time to dig out the raw logs! – medina Jul 29 '10 at 03:11