-1

I have a directory (/home/myuser/logs) that contains the following log files for the last 5 days:

applogs_20130402.txt
applogs_20130401.txt
applogs_20130331.txt
applogs_20130330.txt

Each line of every "applog" has the same structure, just different data:

<timestamp> | <fruit> | <color> | <cost>

So for example, applogs_20130402.txt might look like:

23:41:25 | apple | red | 53
23:41:26 | kiwi | brown | 12
23:41:29 | banana | yellow | 1023
... (etc., every line is pipe delimited like this)

I want to create one "master log" that combines all the log entries (structured, pipe-delimited lines) from all 5 log files into a single file where all timestamps are chronologically ordered. Further, I need the date reflected in the timestamps as well.

So, for instance, if applogs_20130402.txt and applogs_20130401.txt were the only 2 applogs in the directory, and they both looked like this respectively:

applogs_20130402.txt:
=====================
23:41:25 | apple | red | 53
23:41:26 | kiwi | brown | 12
23:41:29 | banana | yellow | 1023

applogs_20130401.txt:
=====================
23:40:33 | blueberry | blue | 4
23:41:28 | apple | green | 81
23:45:49 | plumb | purple | 284

Then, I would want a masterlog.txt file that looks like:

2013-04-01 23:40:33 | blueberry | blue | 4
2013-04-01 23:41:28 | apple | green | 81
2013-04-01 23:45:49 | plumb | purple | 284
2013-04-02 23:41:25 | apple | red | 53
2013-04-02 23:41:26 | kiwi | brown | 12
2013-04-02 23:41:29 | banana | yellow | 1023

I'm on Ubuntu and have access to Bash, python and perl and have no preference which solution is used. Ordinarily I would try a "best attempt" and post it, but I've never dealt with aggregating data like this on Linux. Obviously, the logs are thousands of lines in size, unlike my example above. So doing everything manually isn't an option ;-) Thanks in advance!

CSᵠ
  • 10,049
  • 9
  • 41
  • 64
IAmYourFaja
  • 55,468
  • 181
  • 466
  • 756
  • Seems easy enough with some command line programs, but what have you tried? I didn't vote down but I would say it's because you don't have your efforts included. – squiguy Apr 03 '13 at 18:50
  • 2
    It doesn't really matter if you're on Windows or Linux. I wasn't the downvoter, but I think a little more effort besides describing the problem wouldn't hurt. – simbabque Apr 03 '13 at 18:50

2 Answers2

1

You can use Perl from the command line together with sort like this:

perl -n -e 'printf "%d-%02d-%02d %s", $ARGV =~ m/_(\d{4})(\d\d)(\d\d)/, $_;' *.txt | sort -n

Calling perl with -n wraps a while (<>) { } around your program, which in this case is the stuff in -e ''. In it, we printf the current line ($_), and in front of that we put the date from the file name, which is stored in $ARGV. We use a regex to grab the year, month and day, which are conveniently returned by m// because of the list context from printf.

To this program, we pass all txt files in the folder. The result is piped to the command line tool sort, which sorts the lines numerically using the -n flag.

simbabque
  • 53,749
  • 8
  • 73
  • 136
  • Thanks @simbabque (+1) - when I type this and hit enter (inside a terminal) I see a new line with a ">" character and nothing happens. Any ideas? Thanks again! – IAmYourFaja Apr 03 '13 at 19:20
  • 1
    Since the fully qualified date and time are created, I think you could use the sort without the `-n` flag as they are sortable alphabetically. And I think you are missing a single quote after $_; and before *.txt. – Chris Charley Apr 03 '13 at 19:28
  • Thanks again, but still the same. It's almost like it's opening a "session" or something, or waiting for some kind of input from me... – IAmYourFaja Apr 03 '13 at 19:30
  • There was a `'` missing. Sorry. I had to type this because my VM didn't let me copy. – simbabque Apr 03 '13 at 19:34
  • @ChrisCharley thanks for pointing that out. It's not opening a session, it's simply thinking you are going to enter more, because the closing quote was missing. – simbabque Apr 03 '13 at 19:34
0

Just for the sake of completeness, here is a (g)awk one-liner to accomplish the same:

gawk '{ printf "%s %s\n", gensub(/.+_([0-9]{4})([0-9]{2})([0-9]{2}).+/, "\\1-\\2-\\3", "", FILENAME), $0 }' applogs_* | sort
Adrian Frühwirth
  • 42,970
  • 10
  • 60
  • 71