0

I recently asked for help to parse out Java error stacks from a group of log files and got a very nice solution at the link below (using awk).

Pull out Java error stacks from log files

I marked the question answered and after some debugging and studying I found a few potential issues and since they are unrelated to my initial question but rather due to my limited understanding of awk and regular expressions, I thought it might be better to ask a new question.

Here is the solution:

BEGIN{ OFS="," }
/[[:space:]]+*<Error / {
    split("",n2v)
    while ( match($0,/[^[:space:]]+="[^"]+/) ) {
        name = value = substr($0,RSTART,RLENGTH)
        sub(/=.*/,"",name)
        sub(/^[^=]+="/,"",value)
        $0 = substr($0,RSTART+RLENGTH)
        n2v[name] = value
    print name value
    }
    code = n2v["ErrorCode"]
    desc[code] = n2v["ErrorDescription"]
    count[code]++
    if (!seen[code,FILENAME]++) {
        fnames[code] = (code in fnames ? fnames[code] ", " : "") FILENAME
    }
}
END {
    print "Count", "ErrorCode", "ErrorDescription", "Files"
    for (code in desc) {
        print count[code], code, desc[code], fnames[code]
    }
}

One issue I am having with it is that not all ErrorDescriptions are being captured. For example, this error description appears in the output of this script:

ErrorDescription="Database Error."

But this error description does not appear in the results (description copied from actual log file):

ErrorDescription="Operation not allowed for reason code &quot;7&quot; on table &quot;SCHEMA.TABLE&quot;.. SQLCODE=-668, SQLSTATE=57016, DRIVER=4.13.127"

Nor does this one:

ErrorDescription="Cannot Find Person For Given Order."

It seems that most error descriptions are not being returned by this script but do exist in the log file. I don't see why some error descriptions would appear and some not. Does anyone have any ideas?

EDIT 1:

Here is a sample of the XML I am parsing:

    <Errors>
        <Error ErrorCode="ERR_0139"
            ErrorDescription="Cannot Find Person For Given Order." ErrorMoreInfo="">
    ...
    ...
</Error>
    </Errors>
Community
  • 1
  • 1
Matt
  • 2,503
  • 4
  • 31
  • 46
  • Can you supply a sample of your error logs? Parsing XML with regex can be a nightmare. – Amen Jlili Feb 17 '15 at 13:32
  • Please see the edits - the relevant XML is posted. – Matt Feb 17 '15 at 13:45
  • 2
    My guess is that this is due to the fact that, in your example, `ErrorDescription=...` is split onto a different line than the initial ` – twalberg Feb 17 '15 at 15:22
  • @twalberg is correct. It's very important when posting questions to post input that TRULY represents all flavors of your real input, especially in general where the line breaks can occur. Unless your XML conforms very strictly to some specific, restrictive format, you need an XML parser. – Ed Morton Feb 20 '15 at 23:49
  • @twalberg, that does seem to be the problem. My apologies - I am new to all of this and I didn't think there would be a difference so I didn't post every example I have (these logs total in the GB in size). If using an XML parser is a better approach, do you have a better solution than above or is there a way to fix the above solution to look on the next line for ErrorDescription? I unfortunately do not have control of how the error is written to the log. – Matt Feb 23 '15 at 13:37

2 Answers2

1

This regex would only match the error description.

ErrorDescription="(.+?)"

It uses a capturing group to remember your error description.

Demo here. (Tested against a combination of your edit and your previous question error log.)

Amen Jlili
  • 1,884
  • 4
  • 28
  • 51
1

The pattern in the script will not match your data:

/[[:space:]]+*<Error / {

Details:

  • The "+" tells it to match at least one space.
  • The space after "Error" tells it to match another space - but your data has no space before the "=".
  • The "<" is unnecessary (but not part of the problem).

This would be a better pattern:

/^[[:space:]]*ErrorDescription[[:space:]]*=[[:space:]]*".*"/
Thomas Dickey
  • 51,086
  • 7
  • 70
  • 105