So many interesting approaches proposed here, and based on some fiddling around it looks like the relatives times of each can fluctuate quite a bit based on the lengths of the words being considered.
Let's grab some of the proposed solutions to test:
def original(words):
[sum(c.isdigit() for c in word) / float(len(word)) for word in words]
def filtered_list_comprehension(words):
[len([c for c in word if c.isdigit()]) / len(word) for word in words]
def regex(words):
[len("".join(re.findall("\d", word))) / float(len(word)) for word in words]
def native_filter(words):
[len(filter(str.isdigit, word)) / float(len(word)) for word in words]
def native_filter_with_map(words):
map(lambda word: len(filter(str.isdigit, word))/float(len(word)), words)
And test them each with varying word lengths. Times are in seconds.
Testing with 1000 words of length 10:
original: 1.976
filtered_list_comprehension: 1.224
regex: 2.575
native_filter: 1.209
native_filter_with_map: 1.264
Testing with 1000 words of length 20:
original: 3.044
filtered_list_comprehension: 2.032
regex: 3.205
native_filter: 1.947
native_filter_with_map: 2.034
Testing with 1000 words of length 30:
original: 4.115
filtered_list_comprehension: 2.819
regex: 3.889
native_filter: 2.708
native_filter_with_map: 2.734
Testing with 1000 words of length 50:
original: 6.294
filtered_list_comprehension: 4.313
regex: 4.884
native_filter: 4.134
native_filter_with_map: 4.171
Testing with 1000 words of length 100:
original: 11.638
filtered_list_comprehension: 8.130
regex: 7.756
native_filter: 7.858
native_filter_with_map: 7.790
Testing with 1000 words of length 500:
original: 55.100
filtered_list_comprehension: 38.052
regex: 28.049
native_filter: 37.196
native_filter_with_map: 37.209
From this I would conclude that if your "words" being tested can be up to 500 characters or so long, a regex seems to work well. Otherwise, filter
ing with str.isdigit
seems to be the best approach for a variety of lengths.