Parsing multiple log files with python script - How to?

Question

Im using a python script to import log files in to Piwik and I can successfully parse one log file at a time, but how do I do it for all of the log files in a directory?

From the readme the usage of the script is:

import_logs.py [options] log_file [ log_file [...] ]

So if I had log files u_ex120101.log to u_ex120701.log how could I run it once to do all of those files? I'm sure the answer is staring me in the face but I know basically nothing about python.

Thanks.

jdi · Answer 1 · 2012-07-04T03:10:42.300

1

What about just calling the script with a shell wildcard?

cd logs/
import_logs.py u_*.log

*Note: This does not work for windows though. Windows shell will not expand the wildcard. The receiving program must do it (i.e., import_logs.py)

Solution for windows: Use cygwin, powershell or another *nix-like shell replacement.

edited Jul 04 '12 at 03:10

answered Jul 04 '12 at 02:34

jdi

90,542
19
167
203

I should probably add I'm using windows cmd.exe for this. Will that work there? – K Groll Jul 04 '12 at 02:45
@acowley: Did you try it? Windows uses the `*` wildcard as well. – jdi Jul 04 '12 at 02:50
Yeah I did. I think it's trying to parse the import_logs.py file too – K Groll Jul 04 '12 at 02:53
Sorry, I had it wrong. Using it like this C:\xampp\htdocs\piwik\misc\log-analytics>python import_logs.py --idsite=1 --url=http://localhost/piwik/ u_*.log it doesnt work. It sayes u_*.log file not found. Thanks for your help by the way. – K Groll Jul 04 '12 at 02:59
Nevermind. Windows shell sucks and doesn't expand your wildcards like *nix. The receiving program is responsible for expanding the raw arg – jdi Jul 04 '12 at 03:08
thanks anyway. agreed it sucks. cant change that at work though unfortunately! – K Groll Jul 04 '12 at 03:49

Jeff Tratner · Answer 2 · 2012-07-04T08:39:02.577

1

If you have a bunch of logfiles in a directory and you only want a range of them, another option is to write a small Python script that takes in a range and a base, and just calls import logs for each one (or, if you want to get particularly fancy, you could actually import import_logs directly).

You can run any shell command with Popen in Python. So if you wanted to run import_logs log_base_str01123.txt, you could just run the following:

from subprocess import Popen, PIPE
print Popen("import_logs.py log_base_str01123.txt", stdout=PIPE, shell=True).stdout.read()

and if you wanted to do that for a bunch of strings:

from subprocess import Popen, PIPE
import os
base_prefix = "u_ex"
base_suffix = ".log"
logs=["my", "list", "of", "log#s"]
for log in logs:
    path = "import_logs.py {prefix}{log_name}{suffix}".format(
                prefix=prefix, log_name=log, suffix=base_suffix)
    if not os.path.exists(log):
        print Popen(,
                stdout=PIPE, shell=True).stdout.read())

This could be a more general purpose solution/let you have more finegrained control.

If you want to go through a list of consecutive values, you can just use:

logs = map(str, range(start_number, end_number + 1))

edited Jul 04 '12 at 08:39

answered Jul 04 '12 at 03:00

Jeff Tratner

16,270
4
47
67

Thanks, this answer is out of my league right now. I've just generated a list of all the file names and added them to the script seperated by a space. Ugly but working. – K Groll Jul 04 '12 at 04:33
@acowley understandable. Frankly though, that's probably the easiest way to do it if you don't have a lot of programming experience and you don't need to do this often. You could get fancier and adapt the script to make logs = all the logs modified in the past week. – Jeff Tratner Jul 04 '12 at 08:37
the [ log_file [...] ] part of the usage example doesn't indicate another way of adding multiple logs? I don't understand what that means. Thanks again. – K Groll Jul 05 '12 at 04:43
@acowley , you could use either way. – Jeff Tratner Jul 05 '12 at 19:27

score 1 · Answer 3 · answered Jul 23 '12 at 18:34

You can use the glob module in Python. The glob.glob() function takes in a string containing wildcard and returns a list with matching files and folders.

Example:

import blob

# assume file_argument is a variable containing wildcard
file_argument = '/var/log/*.log'

for log_file in glob.glob(file_argument):
    do_stuff(log_file)

This will cause Python to perform the wildcard expansion for you.

score 1 · Answer 4 · answered Oct 02 '12 at 07:48

I'm using Windows Server 2012, I have no experience with Python, and I have 4 years' worth of log files each about 20mb-40mb in size.

I just wanted to share that I used a free utility that I found called Merge Logs to solve this problem. Using copy *.log merged.txt or type *.log > merged.txt took a very very long time, whereas this utility did the job I need in a few minutes.

Here's the download: http://www.allscoop.com/dotnet-software/log-file-merge.php

Thanks @philwilks, that looks like a handy tool. – K Groll Oct 03 '12 at 00:14 — K Groll, Oct 03 '12 at 00:14

Parsing multiple log files with python script - How to?

4 Answers4