3

How do I process a log file (in my case nginx access.log) in reverse order?

Background I am developing a log file analyser script and I am just not able to get my head around on how to process huge log files from the end so I can sort out the time frames starting with the newest dates I need.

elhombre
  • 2,839
  • 7
  • 28
  • 28
  • Can you define “huge”? A few megabytes is trivial, a few gigabytes is not (though probably if it's reached those sizes you'd better try splitting them) – spectras Jun 05 '16 at 08:14
  • what about the following idea - save the last known position using `f.tell()`, next time use `seek()` so that you'll jump to the last known/seen position – MaxU - stand with Ukraine Jun 05 '16 at 08:20
  • you may want to check the following recipes: http://code.activestate.com/recipes/276149/, http://code.activestate.com/recipes/120686/ – MaxU - stand with Ukraine Jun 05 '16 at 08:54

1 Answers1

0

One way could be to access the end of the file using seek and then scanning the file in reverse from there. Example:

def Tail(filepath, nol=10, read_size=1024):
  """
  This function returns the last line of a file.
  Args:
    filepath: path to file
    nol: number of lines to print
    read_size:  data is read in chunks of this size (optional, default=1024)
  Raises:
    IOError if file cannot be processed.
  """
  f = open(filepath, 'rU')    # U is to open it with Universal newline support
  offset = read_size
  f.seek(0, 2)
  file_size = f.tell()
  while 1:
    if file_size < offset:
      offset = file_size
    f.seek(-1*offset, 2)
    read_str = f.read(offset)
    # Remove newline at the end
    if read_str[offset - 1] == '\n':
      read_str = read_str[:-1]
    lines = read_str.split('\n')
    if len(lines) >= nol:  # Got nol lines
      return "\n".join(lines[-nol:])
    if offset == file_size:   # Reached the beginning
      return read_str
    offset += read_size
  f.close()

Then use as

Tail('/etc/httpd/logs/access.log', 100)

This would give you the last 100 lines of your access.log file.

Code referenced from: http://www.manugarg.com/2007/04/real-tailing-in-python.html

Ben Ebsworth
  • 1
  • 1
  • 2