0

I developed a statistics system for online web service user behavior research in python, which mostly relies on reading and analyzing logs from production server. Currently I shared log folders internally under SMB protocol for the routine analytics program to read, but for the data accessing method I have 2 questions,

  1. Are there any other way accessing logs other than via SMB? or other strategy?
  2. I guess a lot read may block HD of the production and affect normal log writing, any solution to solve this?

I hoped I could come up with some real number but currently don't have. Any guy can give me some guide on doing this more gracefully?

Dmitriy
  • 3,305
  • 7
  • 44
  • 55
Jason Xu
  • 2,903
  • 5
  • 31
  • 54

2 Answers2

1

If you are open to using a third party log aggregation tool, you have a couple of options:

In addition, if you are logging to syslog - many of the commonly used syslog daemons ( eg syslog-ng ) can be configured to forward logs from various applications to one or more of these aggregators. It is trivial to log to syslog from a python application - there is a syslog module in the standard library

Valor
  • 1,305
  • 8
  • 13
Ngure Nyaga
  • 2,989
  • 1
  • 20
  • 30
  • Thanks for your information. Currently I have the system in production so it may cost me more to switch. However I have reasons to implement our own stat system, and currently my question is how to improve the data accessing part. : ) – Jason Xu Oct 09 '12 at 10:56
0

Well, if you have a HTTP server in between (IHS, OHS, I guess Apache too...) then you can expose your physical repositories via a URL: each of your files will benefit from a URL too, and via this kind of code you can download them quite easily:

import os
import urllib2

# Open our local file for writing
f = urllib2.urlopen(url)
with open(os.path.basename(url), 'wb') as local_file:
    local_file.write(f.read())
Emmanuel
  • 13,935
  • 12
  • 50
  • 72
  • Thanks for comment. Web server for exposing the log would be a bit too heavy-weighted. for Samba, I can do "for line in open('\\192.168.100.100\log\2012-12-12.log',block=64MB):..." and process one by one block. what I'm searching now is a more high performance and general access method that's ok for Linux, and if possible, reduce affection on production HD as possible. – Jason Xu Oct 09 '12 at 10:44