Finding all words with length 0 - 4 for a regular expression (method?)

Question

I have an alphabet consisting of 0 and 1 and a regular expression, for example: 1*(011+)*1*. Now I shall find all words of the language that have the length 0 - 4 and fit the regular expression. So the output would be: , 1, 11, 011, 111 ... etc.

I should not give a list of words or numbers as a parameter, but the method should generate all these words by itself. Is there a function or method in the re. module which does exactly that?

so it would be: def....: return re.^\w{0,4}$? Sorry, I'm not familiar with the methods — Unplayable, Oct 09 '19 at 19:21
Welcome to Stack Overflow! Please read the [help pages](https://stackoverflow.com/help), take the [SO tour](https://stackoverflow.com/tour), read about [how to ask good questions](https://stackoverflow.com/help/how-to-ask), as well as this [question checklist](https://codeblog.jonskeet.uk/2012/11/24/stack-overflow-question-checklist/). Also please learn how to create a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). You should [edit](https://stackoverflow.com/posts/58310830/edit) your question to show us what you've done so far. — Ross Jacobs, Oct 09 '19 at 19:28
There are lots of resources online.`import re; s = your_string; re.search(r'^\w{0,4}$', s)` — Mr_U4913, Oct 09 '19 at 20:00
I know what you mean. But I wanted to generate all these words with length 0-4 and then compare them with the regular expression I mentioned above (1*(011+)*1* — Unplayable, Oct 09 '19 at 20:07
Regex is not that complex. Using regex somewhere? Can't see it. Maybe using it on target string? Can't see that either. And, no provision in regex to generate strings out of thin air. — , Oct 09 '19 at 20:10

score 0 · Answer 1 · answered Oct 09 '19 at 21:04

There is no function in the standard library to generate all strings with length 0-4 consisting only of the characters 0 and 1, but it is not hard to build one.

The special case of binary numbers

Notice that all the strings you want to check are binary representations below 16 (= 10000 in binary).

import re

def binary_numbers_below(n):
    return [bin(k)[2:] for k in range(n)]

for word in binary_numbers_below(2**4):
    if re.fullmatch('1*(011+)*1*'):
        print(word) # word is part of your language

It is necessary to cut off the first two characters of bin(k), because bin(k) outputs numbers in the form 0b1000 and we don't want the 0b prefix.

The general case

If you want to generate all words of specific lengths for any given alphabet, you need to do more work:

import re
from itertools import product

def words_of_alphabet(alphabet, min_length, max_length):
    return [''.join(characters) 
            for length in range(min_length, max_length+1) 
            for characters in product(alphabet, repeat=length)]

for word in words_of_alphabet(['0', '1'], 0, 4):
    if re.fullmatch('1*(011+)*1*'):
        print(word) # word is part of your language

words_of_alphabet(['0', '1'], 0, 4) will also include the empty word, whereas the first method does not.

Using generators you can code both functions even more elegantly

def binary_numbers_below(n):
    for k in range(n):
        yield bin(k)[2:]

def words_of_alphabet(alphabet, min_length, max_length):
    for length in range(min_length, max_length+1):
        for characters in product(alphabet, repeat=length):
            yield ''.join(characters)

Finding all words with length 0 - 4 for a regular expression (method?)

1 Answers1