1

I have a large amount of long irregular logs that look like this:

###<date> errortext <errorcode-xxxxx> 
errortext 
errortext 
errortext 
errortext
###<date> errortext <errorcode-yyyy>
errortext 
errortext 
###<date> errortext <errorcode-<zzzzzzz>
errortext 
errortext 
errortext 
errortext 
errortext 
errortext 
errortext 

etc

The length is irregular, and errors with the same error codes need to be found using grep/awk/sed or similar methods.

I need to split these documents by error code, printing all errors of one code into one document.

When I try to find a whole error code segment with a line like:

sed -n '/#</{:start /###/!{N;b start};/<errorcode-024332>/p}' file

The problem with lines like the above is that it will only print the line that includes the "errorcode-024332" and not all the errorcode until the next segment start(with the delimiter "###" in this case).

How do I achieve this?

Flowdorio
  • 157
  • 1
  • 13
  • 1
    https://stackoverflow.com/questions/38972736/how-to-select-lines-between-two-patterns might help, for ex: `awk '/errorcode-024332/{f=1; print; next} /^###/{f=0} f' file` will get you `errorcode-024332` section – Sundeep Feb 21 '17 at 14:48

2 Answers2

2

Your problem happens because both #< and ### match the "header" line, so you only print it and never loop. You also appended to the pattern buffer rather than consuming the lines one by one, so the header would always have been matched anyway.

Assuming you want to display the "header" and "errortext" of the "errorcode-024332", here's how I would do it :

sed -n '/#<.*<errorcode-024332>/{:start p;n;/###/!{b start}}'
  1. when we match the header line corresponding to our error code
  2. we print it
  3. we get the next line
  4. if the next line doesn't contain ###, we go back to step 2.

A quick test I did with your sample data :

$ echo "###<date> errortext <errorcode-xxxxx>
errortext
errortext
[...]
errortext
errortext " | sed -n '/#<.*<errorcode-yyyy>/{:start p;n;/###/!{b start}}'

###<date> errortext <errorcode-yyyy>
errortext
errortext
Aaron
  • 24,009
  • 2
  • 33
  • 57
2

You can use awk, like this:

awk -F'[<>-]' '/^#/{f=$(NF-1)}{print >> f; close(f)}' file.log

Let me explain it as a multiline version:

# Using this set of field delimiters it is simple to access
# the error code in the previous last field
BEGIN { FS="[<>-]"}

# On lines which start with a '#'
/^#/ {
    # We set the output (f)ilename to the error code
    f=$(NF-1)
}

# On all lines ...
{
    # ... append current line to (f)ilename
    print >> f;

    # Make sure to close the file to avoid running out of
    # file descriptors in case there are many different error
    # codes. If you are not concerned about that, you may
    # comment out this line.
    close(f)
}
hek2mgl
  • 152,036
  • 28
  • 249
  • 266