0

While reading a text log file(postfix logs), I need to check certain patterns to partition the line / split the line to store the values of the attributes in each line of the text log file. Hence use regular expression 're.search' method as follows:

                if (re.search(' ms\n$', line)):
                    line2 += line.partition('...')[2].split('\n')[0]
                    break

Is this code equivalent to:

                if ' ms\n' in line:
                    line2 += line.partition('...')[2].split('\n')[0]
                    break

Will the latter code improve the speed of execution of the python code? How can processing of the file be improved if there are many such pattern searches to be made and values updated in a postgres table everyday with increased speed. Each day we have around 300000 records to store , roughly read a text file of size 1474175681 bytes each day. This is currently taking 100% CPU and is not fast either. How can I optimise the code? Most of the code is just checking certain keywords in the file line and processing similar to above code.

user956424
  • 1,611
  • 2
  • 37
  • 67

2 Answers2

1

You should use line.endswith(' ms\n'), which is fully equivalent to re.search(' ms\n$', line) and much faster.

Armin Rigo
  • 12,048
  • 37
  • 48
1

The fastest code is code you do not execute. Rethink the problem workflow and avoid pulling text out of a database only to process it in python.

mattip
  • 2,360
  • 1
  • 13
  • 13