How to improve speed to process a file using Regular expression which takes more CPU time using pypy?

Question

While reading a text log file(postfix logs), I need to check certain patterns to partition the line / split the line to store the values of the attributes in each line of the text log file. Hence use regular expression 're.search' method as follows:

                if (re.search(' ms\n$', line)):
                    line2 += line.partition('...')[2].split('\n')[0]
                    break

Is this code equivalent to:

                if ' ms\n' in line:
                    line2 += line.partition('...')[2].split('\n')[0]
                    break

Will the latter code improve the speed of execution of the python code? How can processing of the file be improved if there are many such pattern searches to be made and values updated in a postgres table everyday with increased speed. Each day we have around 300000 records to store , roughly read a text file of size 1474175681 bytes each day. This is currently taking 100% CPU and is not fast either. How can I optimise the code? Most of the code is just checking certain keywords in the file line and processing similar to above code.

score 1 · Answer 1 · answered Oct 14 '20 at 07:46

1

You should use line.endswith(' ms\n'), which is fully equivalent to re.search(' ms\n$', line) and much faster.

answered Oct 14 '20 at 07:46

Armin Rigo

12,048
37
48

score 1 · Answer 2 · answered Oct 14 '20 at 19:08

1

The fastest code is code you do not execute. Rethink the problem workflow and avoid pulling text out of a database only to process it in python.

answered Oct 14 '20 at 19:08

mattip

2,360
1
13
13

How to improve speed to process a file using Regular expression which takes more CPU time using pypy?

2 Answers2