I have a Java application that, when erroring out, writes an error stack similar to the below for each error.
<Errors>
<Error ErrorCode="Code" ErrorDescription="Description" ErrorInfo="" ErrorId="ID">
<Attribute Name="ErrorCode" Value="Code"/>
<Attribute Name="ErrorDescription" Value="Description"/>
<Attribute Name="Key" Value="Key"/>
<Attribute Name="Number" Value="Number"/>
<Attribute Name="ErrorId" Value="ID"/>
<Attribute Name="UserId" Value="User"/>
<Attribute Name="ProgId" Value="Prog"/>
<Stack>typical Java stack</Stack>
</Error>
<Error>
Similar info to the above
</Error>
</Errors>
I wrote a Java log parser to go through the log files and gather information about such errors and while it does work, it is slow and inefficient, especially for log files in the hundreds of megabytes. I just basically use string manipulation to detect where the start/end tags are and tally them up.
Is there a way (either via Unix grep, Python, or Java) to efficiently extract the errors and get a count of the number of times each one happens? The entire log file is not XML so I cannot use an XML parser or Xpath. Another problem I am facing is that sometimes the end of an error might roll into another file so the current file might not have the entire stack as above.
EDIT 1:
Here is what I currently have (relevant portions only to save space).
//Parse files
for (File f : allFiles) {
System.out.println("Parsing: " + f.getAbsolutePath());
BufferedReader br = new BufferedReader(new FileReader(f));
String line = "";
String fullErrorStack = "";
while ((line = br.readLine()) != null) {
if (line.contains("<Errors>")) {
fullErrorStack = line;
while (!line.contains("</Errors>")) {
line = br.readLine();
try {
fullErrorStack = fullErrorStack + line.trim() + " ";
} catch (NullPointerException e) {
//End of file but end of error stack is in another file.
fullErrorStack = fullErrorStack + "</Stack></Error></Errors> ";
break;
}
}
String errorCode = fullErrorStack.substring(fullErrorStack.indexOf("ErrorCode=\"") + "ErrorCode=\"".length(), fullErrorStack.indexOf("\" ", fullErrorStack.indexOf("ErrorCode=\"")));
String errorDescription = fullErrorStack.substring(fullErrorStack.indexOf("ErrorDescription=\"") + "ErrorDescription=\"".length(), fullErrorStack.indexOf("\" ", fullErrorStack.indexOf("ErrorDescription=\"")));
String errorStack = fullErrorStack.substring(fullErrorStack.indexOf("<Stack>") + "<Stack>".length(), fullErrorStack.indexOf("</Stack>", fullErrorStack.indexOf("<Stack>")));
apiErrors.add(f.getAbsolutePath() + splitter + errorCode + ": " + errorDescription + splitter + errorStack.trim());
fullErrorStack = "";
}
}
}
Set<String> uniqueApiErrors = new HashSet<String>(apiErrors);
for (String uniqueApiError : uniqueApiErrors) {
apiErrorsUnique.add(uniqueApiError + splitter + Collections.frequency(apiErrors, uniqueApiError));
}
Collections.sort(apiErrorsUnique);
EDIT 2:
Sorry for forgetting to mention the desired output. Something like the below would be ideal.
Count, ErrorCode, ErrorDescription, List of files it occurs in (if possible)