I have two version of some simple text parser (it validates login correctness):
rgx = re.compile(r"^[a-zA-Z][a-zA-Z0-9.-]{0,18}[a-zA-Z0-9]$")
def rchecker(login):
return bool(rgx.match(login))
max_len = 20
def occhecker(login):
length_counter = max_len
for c in login:
o = ord(c)
if length_counter == max_len:
if not (o > 96 and o < 123) and \
not (o > 64 and o < 91): return False
if length_counter == 0: return False
# not a digit
# not a uppercase letter
# not a downcase letter
# not a minus or dot
if not (o > 47 and o < 58) and \
not (o > 96 and o < 123) and \
not (o > 64 and o < 91) and \
o != 45 and o != 46: return False
length_counter -= 1
if length_counter < max_len:
o = ord(c)
if not (o > 47 and o < 58) and \
not (o > 96 and o < 123) and \
not (o > 64 and o < 91): return False
else: return True
else: return False
correct_end = string.ascii_letters + string.digits
correct_symbols = correct_end + "-."
def cchecker(login):
length_counter = max_len
for c in login:
if length_counter == max_len and c not in string.ascii_letters:
return False
if length_counter == 0:
return False
if c not in correct_symbols:
return False
length_counter -= 1
if length_counter < max_len and c in correct_end:
return True
else:
return False
There are three methods do all the same work: check the few rules for login. I think it's clear with regex rule. I made cProfile benchmarks for these methods with 280000 logins and got results I can't understand.
with regex
560001 function calls in 1.202 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
280000 0.680 0.000 1.202 0.000 logineffcheck.py:10(rchecker)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
280000 0.522 0.000 0.522 0.000 {method 'match' of '_sre.SRE_Pattern' objects}
with ord
3450737 function calls in 8.599 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
280000 5.802 0.000 8.599 0.000 logineffcheck.py:14(occhecker)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
3170736 2.797 0.000 2.797 0.000 {ord}
with in method
280001 function calls in 1.709 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
280000 1.709 0.000 1.709 0.000 logineffcheck.py:52(cchecker)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
I created 100k logins with correct form, 60k logins with cyrillic letters, 60k logins have length 24 instead of 20 and 60k logins have length 0. So, there are 280k. How to explain that regex is much more faster than simple cycle with ord?