I've been working in a project that manages big lists of words and pass them trough a lot of tests to validate or not each word of the list. The funny thing is that each time that I've used "faster" tools like the itertools
module, they seem to be slower.
Finally I decided to ask the question because it is possible that I be doing something wrong. The following code will try to test the performance of the any()
function versus the use of loops.
#!/usr/bin/python3
#
import time
from unicodedata import normalize
file_path='./tests'
start=time.time()
with open(file_path, encoding='utf-8', mode='rt') as f:
tests_list=f.read()
print('File reading done in {} seconds'.format(time.time() - start))
start=time.time()
tests_list=[line.strip() for line in normalize('NFC',tests_list).splitlines()]
print('String formalization, and list strip done in {} seconds'.format(time.time()-start))
print('{} strings'.format(len(tests_list)))
unallowed_combinations=['ab','ac','ad','ae','af','ag','ah','ai','af','ax',
'ae','rt','rz','bt','du','iz','ip','uy','io','ik',
'il','iw','ww','wp']
def combination_is_valid(string):
if any(combination in string for combination in unallowed_combinations):
return False
return True
def combination_is_valid2(string):
for combination in unallowed_combinations:
if combination in string:
return False
return True
print('Testing the performance of any()')
start=time.time()
for string in tests_list:
combination_is_valid(string)
print('combination_is_valid ended in {} seconds'.format(time.time()-start))
start=time.time()
for string in tests_list:
combination_is_valid2(string)
print('combination_is_valid2 ended in {} seconds'.format(time.time()-start))
The previous code is pretty representative of the kind of tests I do, and if we take a look to the results:
File reading done in 0.22988605499267578 seconds
String formalization, and list strip done in 6.803032875061035 seconds
38709922 strings
Testing the performance of any()
combination_is_valid ended in 80.74802565574646 seconds
combination_is_valid2 ended in 41.69514226913452 seconds
File reading done in 0.24268722534179688 seconds
String formalization, and list strip done in 6.720442771911621 seconds
38709922 strings
Testing the performance of any()
combination_is_valid ended in 79.05265760421753 seconds
combination_is_valid2 ended in 42.24800777435303 seconds
I find kinda amazing that using loops is half faster than using any()
. What would be the explanation for that? Am I doing something wrong?
(I used python3.4 under GNU-Linux)