0

I'm trying to do following, if I have the following ERROR line in log file:

Aug  9 12:44:39 hostnameABC gnome-terminal-[12581]: Theme parsing error: gtk.css:6765:28: Missing opening bracket in color definition

i need to end up with a dictionary (python), that looks like this:

gnome-terminal-[1258] = {ERROR: 1}

And if there was already such an dictionary in scope, then ERROR += 1 . Finally print the dictionary name and Key Value.

Is this even possible ?

rajkris
  • 1,775
  • 1
  • 9
  • 16
  • You want to count occurence of ``gnome-terminal-[12581]`` in log file ? – sushanth Aug 09 '20 at 08:48
  • What makes an error an error? – Jan Aug 09 '20 at 08:50
  • Please elaborate on "scope" – rajkris Aug 09 '20 at 08:59
  • THe error line is signified by presence of "error" in it. Thats when it is error. There could be "info" too , in it. – user3826395 Aug 09 '20 at 09:02
  • THe error line is signified by presence of "error" in it. Thats when it is error. There could be "info" too , in it . Also scope means, if there is another instance of error line forgnome-terminal-[12581] , then we need to check if it is already initialized as dictionary and add 1 to the error key. Or achieve this in any other way. – user3826395 Aug 09 '20 at 09:17
  • Finally, i need to print csv file for all different kind of errors, like: gnome-terminal-[12581] , ERROR, 2 – user3826395 Aug 09 '20 at 09:18

3 Answers3

0

Try this:

import re

# As I don't have access to the original file, you can
# uncomment the code below to get the lines from the file

#with open('filename.txt') as file:
#     lines = file.readlines()


# Now, assuming  these are the lines from the log file
lines = [
'Aug 9 12:44:39 hostnameABC gnome-terminal-[12581]: Theme parsing error: gtk.css:6765:28: Missing opening bracket in color definition',
'Aug 9 12:44:39 hostnameABC gnome-terminal-[12581]: Theme parsing error: gtk.css:6765:28: Missing opening bracket in color definition',
'Aug 9 12:44:39 hostnameABC gnome-terminal-[1581]: Theme parsing error: gtk.css:6765:28: Missing opening bracket in color definition'
]

er_regex = re.compile(r'gnome-terminal-\[\d+\]')

def er_count():
   count = {}
   er_ins = er_regex.findall(' '.join(lines))
   for er in er_ins:
       count.setdefault(er, 0)
       count[er] += 1
   return(count)

print(er_count())

You get a dictionary with a count for each error :)

pitamer
  • 905
  • 3
  • 15
  • 27
  • thanks for the code contribution. Are you able to change the name, count dictionary to gnome-terminal-[12581] ? It is requirement to have the dictionary name by its error type in this case "gnome-terminal-[12581] " . Also remember , if it occurs next time in log file, we need to check if the dictionary already exisit and then add 1 to error count. – user3826395 Aug 09 '20 at 09:23
  • In general, dynamically assigning variable names is [almost never a good idea](https://nedbatchelder.com/blog/201112/keep_data_out_of_your_variable_names.html). The best approach would be to have a single dictionary variable, and in it, have keys with the names of errors, with an appropriate value for each one. As for checking if the dictionary already exists, I'm not sure I understand what you mean... – pitamer Aug 09 '20 at 09:32
0

You cannot use hyphens and square brackets in your variable name. You can get around this by using a higher level dictionary, so that your names with hyphens and square brackets are keys in that dictionary instead. Your solution could look something like this:

import re

log = '''Aug  9 12:44:39 hostnameABC gnome-terminal-[12581]: Theme parsing error: gtk.css:6765:28: Missing opening bracket in color definition
Aug  9 12:44:39 hostnameABC gnome-terminal-[12581]: Theme parsing error: gtk.css:6765:28: Missing opening bracket in color definition
Aug  9 12:44:39 hostnameABC gnome-terminal-[12581]: Info only'''

data = {}

matches = re.findall(r'(gnome-terminal-\[\d+\])(?=.*error)', log)
for match in matches:
  data[match] = data.setdefault(match, {'ERROR': 0})
  data[match]['ERROR'] += 1

print(data)
# {'gnome-terminal-[12581]': {'ERROR': 2}}

If you avoid invalid characters in the names, you can use the same approach as above, and just leave out the top level dictionary.

rikusv
  • 646
  • 5
  • 10
  • dictionary name needs to come from log line only, in this case gnome-terminal-[12581] – user3826395 Aug 09 '20 at 09:23
  • Your variable name cannot contain hyphens. You could do something like this: `error_dicts['gnome-terminal-'] = {}` Should the dictionary be e.g. 'gnome-terminal-' and the key '12581'? Or do you mean the dict must actually be 'gnome-terminal-[12581]'...? – rikusv Aug 09 '20 at 09:28
  • ok for simplicity, lets say gnome_terminal instead of gnome-terminal-[12581] in the log file :) , provided underscore is allowed in var name – user3826395 Aug 09 '20 at 09:39
  • I've updated my answer to get close to what you need, avoiding the hyphens in the dict names by having a top-level dict `data`. – rikusv Aug 09 '20 at 09:39
  • The square brackets are also an issue, because that is reserved for access into the dictionary. – rikusv Aug 09 '20 at 09:45
0

THank you folks, specially pitamer and rikus for all your help. I was able to use your input code and idea and make the solution for myself.

    error_pattern = r'ticky: ERROR ([\w\s\']*) \((.+)\)'
    info_pattern = r'ticky: INFO.* \((.+)\)'
    user_stat = {}
    
    with open('syslog.log','r') as logs:
      for line in logs.readlines():
        if re.search(error_pattern,line):
          results = re.search(error_pattern, line)
          user_stat.setdefault(results.group(2),[0,0])[1]+=1
        if re.search(info_pattern,line):
          results = re.search(info_pattern, line)
          user_stat.setdefault(results.group(1),[0,0])[0]+=1
    
    user_sorted = sorted(user_stat.items())
    
with open('user_statistics.csv','w') as output:
  csvfiler = csv.writer(output)
  csvfiler.writerow(['Username','INFO','ERROR'])
  for item in user_sorted:
      onerow = [item[0],item[1][0],item[1][1]]
      csvfiler.writerow(onerow)