Grabbing log files from production server

Question

I developed a statistics system for online web service user behavior research in python, which mostly relies on reading and analyzing logs from production server. Currently I shared log folders internally under SMB protocol for the routine analytics program to read, but for the data accessing method I have 2 questions,

Are there any other way accessing logs other than via SMB? or other strategy?
I guess a lot read may block HD of the production and affect normal log writing, any solution to solve this?

I hoped I could come up with some real number but currently don't have. Any guy can give me some guide on doing this more gracefully?

Can you rotate the logs in the writer, and then have the reader only pull from the archive ? — Jonathan Vanasco, Oct 09 '12 at 15:16

score 1 · Answer 1 · edited Oct 09 '12 at 14:35

1

If you are open to using a third party log aggregation tool, you have a couple of options:

In addition, if you are logging to syslog - many of the commonly used syslog daemons ( eg syslog-ng ) can be configured to forward logs from various applications to one or more of these aggregators. It is trivial to log to syslog from a python application - there is a syslog module in the standard library

edited Oct 09 '12 at 14:35

Valor

1,305
8
13

answered Oct 09 '12 at 10:27

Ngure Nyaga

2,989
1
20
30

Thanks for your information. Currently I have the system in production so it may cost me more to switch. However I have reasons to implement our own stat system, and currently my question is how to improve the data accessing part. : ) – Jason Xu Oct 09 '12 at 10:56

score 0 · Answer 2 · answered Oct 09 '12 at 10:17

0

Well, if you have a HTTP server in between (IHS, OHS, I guess Apache too...) then you can expose your physical repositories via a URL: each of your files will benefit from a URL too, and via this kind of code you can download them quite easily:

import os
import urllib2

# Open our local file for writing
f = urllib2.urlopen(url)
with open(os.path.basename(url), 'wb') as local_file:
    local_file.write(f.read())

answered Oct 09 '12 at 10:17

Emmanuel

13,935
12
50
72

Thanks for comment. Web server for exposing the log would be a bit too heavy-weighted. for Samba, I can do "for line in open('\\192.168.100.100\log\2012-12-12.log',block=64MB):..." and process one by one block. what I'm searching now is a more high performance and general access method that's ok for Linux, and if possible, reduce affection on production HD as possible. – Jason Xu Oct 09 '12 at 10:44

Grabbing log files from production server

2 Answers2