1

Can anyone recommend a good tool to parse and analyze Nginx access logs which will group the urls based on the regex in my django urls.py files (or a config file generated from them)? It does not need to be real-time.

One of my primary concerns is looking at "request time" for various pages for which the url may contain slugs or uuids and may contain complex querystrings.
For example: www.example.com/event/detail/my_event_uuid/?something=1234&somethingelse=abc

My core concern is that I be able to view aggregate statistics for all event detail pages. As an added bonus I would like to be able to see all event detail pages where "somethingelse" is in the querystring.

Other considerations: lightweight, open source, no database tables added to the django project if possible.

AgDude
  • 181
  • 7

1 Answers1

1

You can use bash to list all event_uuid from access log entry that contains word somethingelse:

grep '&somethingelse' /var/log/nginx/access.log | awk -F/ '{print $4}' 

Or just install the popular awstat to do general access log analysis and it gives you more than you expect.

shawnzhu
  • 653
  • 4
  • 10
  • I looked at awstats, and did't see in the docs how to define dynamic urls. In my example above the part of the url "my_event_uuid" is dynamic. Also note that the dynamic part of the url may not be the last part, but will be defined using regex. If that is possible with awstats can you please explain the configuration? – AgDude Oct 04 '13 at 14:01
  • The simplest way is providing awstat the log entries that contains those dynamic URLs only. Personally it would be easier to filter then extract what you want from access log entries than configuring any other log analysis tool. – shawnzhu Oct 04 '13 at 14:07
  • Thanks for your suggestion. However, that isn't practical for monitoring a production server with hundreds of urls. – AgDude Oct 04 '13 at 14:12
  • a cron job would be good enough for scale like hundreds of urls. Notice this is async operation and you didn't expect 'real-time' feedback. Or I would suggest you use a custom Django middleware to filter URIs instead of reading low level access log file. – shawnzhu Oct 04 '13 at 14:20