0

I want to collect and analyze 404 data to address any real issues, in an ASP.NET MVC site (with ELMAH). The chief requirement is to store this information in a more specialized and dense but still queryable format, including the referring site/URL.

I can currently review 404's in ELMAH. However I do not want ELMAH collecting all my 404's (at least not in the default format), because these error logs get large too rapidly. Only about 1% of an ELMAH 404 log is typically relevant data, for example logging irrelevant exception details about mundane vulnerability scans. Then, finding real errors becomes very difficult, or even impossible if I have to truncate my ELMAH table weekly.

Also, even after collecting all that data in ELMAH, it does not offer specialized fields for the critical target and referer URL fields (to query or aggregate) that make managing 404's possible.

If there's a package (e.g. via NuGet) that is able to store to SQL, includes a presentation layer, can sort by most common errors or errors with actual referring sources, and even permits marking them seen/addressed so they do not show in future reports, that would be an ideal solution. Any solution providing a portion of that would be a great start.

In lieu of a recommendation, I will probably add a custom handler to ELMAH and log to SQL through my own data layer.

However, I'd prefer a packaged solution, and it need not leverage ELMAH. I can manually add a filter to ELMAH (Elmah reporting unwanted 404 errors, ELMAH - Filtering 404 Errors) if ELMAH is not part of the solution.

Community
  • 1
  • 1
shannon
  • 8,664
  • 5
  • 44
  • 74
  • Uh, comment please if you downvote/closevote - how is this off-topic? With ELMAH installed, 404's become the application responsibility. If you think there's a simple way to move this question outside the programming domain, wouldn't a note also be courteous? Lately people seem to think that if there isn't code visible, it's a bad Q. – shannon Feb 26 '14 at 19:57

1 Answers1

1

I'm one of the developers behind https://elmah.io. elmah.io offers some of the features you are looking for. You can search for errors by different key properties. Also the filter part can be implemented using our Rules option, where you can ignore errors from specific user agents and so on.

We are also creating a ErrorLog implementation for ELMAH, making it possible for you to store errors in Elasticsearch: https://github.com/elmahio/Elmah.Io.ElasticSearch. You could search and aggregate all of your 404's using a UI for Elasticsearch like Kibana.

ThomasArdal
  • 4,999
  • 4
  • 33
  • 73
  • This is a great answer. I love the elmah.io idea, and also your reference to Elasticsearch. Unfortunately, with hundreds of 404's per-day, the $50-$100/mo. price for elmah.io is a fee commitment I'm not prepared to build into this application. – shannon Feb 26 '14 at 20:49
  • I totally understand. Unless you can filter out most of the errors using ignore rules, you should look at hosting it yourself. Why do you have all those 404's anyway? Can't you do something to fix that? – ThomasArdal Feb 26 '14 at 20:53
  • Most of the 404's are vulnerability scans. A lot of them are also invalid image and style links from member-posted documents, which is an artifact of a legacy conversion process. There are a bunch that look like actual bad referrals, but it's really hard to tell with so much noise. – shannon Feb 26 '14 at 20:58
  • Sometimes it's really hard to pin down 404's of course. In a lot of cases I'll look at a link and think it's something someone would reasonably expect to find at our site based on our market, but then I'll find the URL on a vulnerability matrix. – shannon Feb 26 '14 at 21:08
  • But can't you identify those errors by user agent, message or similar? In that case just add ignore rules for those errors. Btw elmah.io is still in beta and all packages free. – ThomasArdal Feb 26 '14 at 21:08
  • For the vulnerability scans there's really no good way to filter those that I'm aware of without an expensive firewall service subscription. Obviously hackers try to emulate reasonable requests, including user agent. – shannon Feb 26 '14 at 21:11
  • Also, Thomas, just FYI as you are building pricing. Another source of many of our 404's is alternate domains we have purchased for our brand, that seem to have acquired a bunch of invalid but harmless marketing links from all over. – shannon Mar 06 '14 at 20:22