0

Im using a python script to import log files in to Piwik and I can successfully parse one log file at a time, but how do I do it for all of the log files in a directory?

From the readme the usage of the script is:

import_logs.py [options] log_file [ log_file [...] ]

So if I had log files u_ex120101.log to u_ex120701.log how could I run it once to do all of those files? I'm sure the answer is staring me in the face but I know basically nothing about python.

Thanks.

K Groll
  • 518
  • 2
  • 8
  • 18

4 Answers4

1

What about just calling the script with a shell wildcard?

cd logs/
import_logs.py u_*.log

*Note: This does not work for windows though. Windows shell will not expand the wildcard. The receiving program must do it (i.e., import_logs.py)

Solution for windows: Use cygwin, powershell or another *nix-like shell replacement.

jdi
  • 90,542
  • 19
  • 167
  • 203
  • I should probably add I'm using windows cmd.exe for this. Will that work there? – K Groll Jul 04 '12 at 02:45
  • @acowley: Did you try it? Windows uses the `*` wildcard as well. – jdi Jul 04 '12 at 02:50
  • Yeah I did. I think it's trying to parse the import_logs.py file too – K Groll Jul 04 '12 at 02:53
  • Sorry, I had it wrong. Using it like this C:\xampp\htdocs\piwik\misc\log-analytics>python import_logs.py --idsite=1 --url=http://localhost/piwik/ u_*.log it doesnt work. It sayes u_*.log file not found. Thanks for your help by the way. – K Groll Jul 04 '12 at 02:59
  • Nevermind. Windows shell sucks and doesn't expand your wildcards like *nix. The receiving program is responsible for expanding the raw arg – jdi Jul 04 '12 at 03:08
  • thanks anyway. agreed it sucks. cant change that at work though unfortunately! – K Groll Jul 04 '12 at 03:49
1

If you have a bunch of logfiles in a directory and you only want a range of them, another option is to write a small Python script that takes in a range and a base, and just calls import logs for each one (or, if you want to get particularly fancy, you could actually import import_logs directly).

You can run any shell command with Popen in Python. So if you wanted to run import_logs log_base_str01123.txt, you could just run the following:

from subprocess import Popen, PIPE
print Popen("import_logs.py log_base_str01123.txt", stdout=PIPE, shell=True).stdout.read()

and if you wanted to do that for a bunch of strings:

from subprocess import Popen, PIPE
import os
base_prefix = "u_ex"
base_suffix = ".log"
logs=["my", "list", "of", "log#s"]
for log in logs:
    path = "import_logs.py {prefix}{log_name}{suffix}".format(
                prefix=prefix, log_name=log, suffix=base_suffix)
    if not os.path.exists(log):
        print Popen(,
                stdout=PIPE, shell=True).stdout.read())

This could be a more general purpose solution/let you have more finegrained control.

If you want to go through a list of consecutive values, you can just use:

logs = map(str, range(start_number, end_number + 1))
Jeff Tratner
  • 16,270
  • 4
  • 47
  • 67
  • Thanks, this answer is out of my league right now. I've just generated a list of all the file names and added them to the script seperated by a space. Ugly but working. – K Groll Jul 04 '12 at 04:33
  • @acowley understandable. Frankly though, that's probably the easiest way to do it if you don't have a lot of programming experience and you don't need to do this often. You could get fancier and adapt the script to make logs = all the logs modified in the past week. – Jeff Tratner Jul 04 '12 at 08:37
  • the [ log_file [...] ] part of the usage example doesn't indicate another way of adding multiple logs? I don't understand what that means. Thanks again. – K Groll Jul 05 '12 at 04:43
  • @acowley , you could use either way. – Jeff Tratner Jul 05 '12 at 19:27
1

You can use the glob module in Python. The glob.glob() function takes in a string containing wildcard and returns a list with matching files and folders.

Example:

import blob

# assume file_argument is a variable containing wildcard
file_argument = '/var/log/*.log'

for log_file in glob.glob(file_argument):
    do_stuff(log_file)

This will cause Python to perform the wildcard expansion for you.

ryucl0ud
  • 622
  • 4
  • 7
1

I'm using Windows Server 2012, I have no experience with Python, and I have 4 years' worth of log files each about 20mb-40mb in size.

I just wanted to share that I used a free utility that I found called Merge Logs to solve this problem. Using copy *.log merged.txt or type *.log > merged.txt took a very very long time, whereas this utility did the job I need in a few minutes.

Here's the download: http://www.allscoop.com/dotnet-software/log-file-merge.php

philwilks
  • 669
  • 5
  • 16