2

I need to generate all possible strings of certain length X that satisfies the following two rules:

  1. Must end with '0'
  2. There can't be two or more adjacent '1'

For example, when X = 4, all 'legal' strings are [0000, 0010, 0100, 1000, 1010].

I already wrote a piece of recursive code that simply append the newly found string to a list.

def generate(pre0, pre1, cur_len, max_len, container = []):
    if (cur_len == max_len-1):
        container.append("".join([pre0, pre1, "0"]))
        return

    if (pre1 == '1'):
        cur_char = '0'
        generate(pre0+pre1, cur_char, cur_len+1, max_len, container)
    else:   
        cur_char = '0'
        generate(pre0+pre1, cur_char, cur_len+1, max_len, container)
        cur_char = '1'
        generate(pre0+pre1, cur_char, cur_len+1, max_len, container)

if __name__ == "__main__": 
    container = []
    _generate("", "", 0, 4, container)
    print container

However this method won't work while X reaches 100+ because the memory complexity. I am not familiar with the generator method in Python, so could anyone here help me figure out how to re-write it into a generator? Thanks so much!

P.S. I am working on a Project Euler problem, this is not homework.

Update 1:

Grateful to the first 3 answers, I am using Python 2.7 and can switch to python 3.4. The reason I am asking for the generator is that I can't possibly hold even just the final result list in my memory. A quick mathematical proof will show that there are Fibonacci(X) possible strings for the length X, which means I have to really use a generator and filter the result on the fly.

Markus Meskanen
  • 19,939
  • 18
  • 80
  • 119
JimmyK
  • 1,030
  • 10
  • 13

4 Answers4

2

Assuming you're using python version >= 3.4, where yield from is available, you can yield instead of accumulating+returning. You don't need to pass around a container.

def generate(pre0, pre1, cur_len, max_len):
    if (cur_len == max_len-1):
        yield "".join((pre0, pre1, "0"))
        return

    if (pre1 == '1'):
        yield from generate(pre0+pre1, '0', cur_len+1, max_len)
    else:   
        yield from generate(pre0+pre1, '0', cur_len+1, max_len)
        yield from generate(pre0+pre1, '1', cur_len+1, max_len)

if __name__ == "__main__": 
    for result in generate("", "", 0, 4):
        print result

If you're using a python version where yield from is not available, replace those lines with:

for x in generate(...):
    yield x
CodeManX
  • 11,159
  • 5
  • 49
  • 70
shx2
  • 61,779
  • 13
  • 130
  • 153
  • 1
    Brilliant answer! This is by far the fastest solution, ~30x faster than the monstrous one-liner from the accepted answer (see my answer for benchmark). I'll have to think a bit more about how `yield` and `return` play together in this one... Do you happen to know how to implement the same in let's say C++ or Assembly? – CodeManX Aug 24 '15 at 23:07
  • @CoDEmanX: If you wanted to translate it directly, you could use coroutines, libraries for which are available. However, more pragmatically you would pass to `generate` a function pointer that is called whenever a new result is generated. – icktoofay Aug 25 '15 at 01:34
2

You can use a filter with itertools.product

def generate(max_len):
    return list(filter(lambda i: '11' not in i, (''.join(i) + '0' for i in itertools.product('01', repeat=max_len-1))))

This uses generators the entire time until the return which finally creates a list. The filter will act on each of the strings created by itertools.product as they are produced.

>>> generate(5)
['00000', '00010', '00100', '01000', '01010', '10000', '10010', '10100']

Edit To use this function as a generator expression, just drop the list and switch filter to itertools.ifilter

def generate(max_len):
    return itertools.ifilter(lambda i: '11' not in i, (''.join(i) + '0' for i in itertools.product('01', repeat=max_len-1)))

for s in generate(10):
    # do something with s
Cory Kramer
  • 114,268
  • 16
  • 167
  • 218
  • Is there any other ways that I can get a generator and for loop it instead of getting a list? I have updated the question to explain why I need it that way. Thanks again. – JimmyK Aug 23 '15 at 20:39
  • 1
    @JimmyK Sure, just drop the `list` call and switch Python's native `filter` for `itertools.ifilter` which returns an iterator (only needed for Python 2.x, in Python 3.x `filter` already returns an iterator) – Cory Kramer Aug 23 '15 at 20:41
2

Lame string-based testing whether "11" is contained in the formatted string and yield if it's not (for every even number up to 2^maxlen):

def gen(maxlen):
    pattern = "{{:0{}b}}".format(maxlen)
    for i in range(0, 2**maxlen, 2):
        s = pattern.format(i) # not ideal, because we always format to test for "11"
        if "11" not in s:
            yield s

Superior mathematical approach (M xor M * 2 = M * 3):

def gen(maxlen):
    pattern = "{{:0{}b}}".format(maxlen)
    for i in range(0, 2**maxlen, 2):
        if i ^ i*2 == i*3:
            yield pattern.format(i)

Here's a benchmark for 6 different implementations (Python 3!):

from time import clock
from itertools import product

def math_range(maxlen):
    pattern = "{{:0{}b}}".format(maxlen)
    for i in range(0, 2**maxlen, 2):
        if i ^ i*2 == i*3:
            yield pattern.format(i)


def math_while(maxlen):
    pattern = "{{:0{}b}}".format(maxlen)
    maxnum = 2**maxlen - 1
    i = 0
    while True:
        if i ^ i*2 == i*3:
            yield pattern.format(i)
        if i >= maxnum:
            break
        i += 2


def itertools_generator(max_len):
    return filter(lambda i: '11' not in i, (''.join(i) + '0' for i in product('01', repeat=max_len-1)))


def itertools_list(maxlen):
    return list(filter(lambda i: '11' not in i, (''.join(i) + '0' for i in product('01', repeat=maxlen-1))))


def string_based(maxlen):
    pattern = "{{:0{}b}}".format(maxlen)
    for i in range(0, 2**maxlen, 2):
        s = pattern.format(i)
        if "11" not in s:
            yield s


def generate(pre0, pre1, cur_len, max_len):
    if (cur_len == max_len-1):
        yield "".join((pre0, pre1, "0"))
        return

    if (pre1 == '1'):
        yield from generate(pre0+pre1, "0", cur_len+1, max_len)
    else:
        yield from generate(pre0+pre1, "0", cur_len+1, max_len)
        yield from generate(pre0+pre1, "1", cur_len+1, max_len)

def string_based_smart(val):
    yield from generate("", "", 0, val)


def benchmark(val, *funcs):
    for i, func in enumerate(funcs, 1):
        start = clock()
        for g in func(val):
            g
        print("{}. {:6.2f} - {}".format(i, clock()-start, func.__name__))

benchmark(24, string_based_smart, math_range, math_while, itertools_generator, itertools_list, string_based)

Some numbers for string length = 24 (in seconds):

1.   0.24 - string_based_smart
2.   1.73 - math_range
3.   2.59 - math_while
4.   6.95 - itertools_generator
5.   6.78 - itertools_list
6.   6.45 - string_based

shx2's algorithm is clearly the winner, followed by math. Pythonic code makes quite a difference if you compare the results of both math approaches (note: ranges are also generators).

Noteworthy: the itertools_* functions perform almost equally slow, but itertools_list needs a lot more memory to store the list in (~6 MB spike in my test). All other generator-based solutions have a minimal memory footprint, because they only need to store the current state and not the entire result.

None of the shown functions blows up the stack, because they do not use actual recursion. Python does not optimize tail recursion, thus you need loops and generators.

//edit: naive C++ implementation of math_range (MSVS 2013):

#include "stdafx.h"
#include <iostream>
#include <bitset>
#include <ctime>
#include <fstream>

using namespace std;

int _tmain(int argc, _TCHAR* argv[])
{
    const unsigned __int32 maxlen = 24;
    const unsigned __int32 maxnum = 2 << (maxlen - 1);

    clock_t begin = clock();

    ofstream out;
    out.open("log.txt");
    if (!out.is_open()){
        cout << "Can't write to target";
        return 1;
    }

    for (unsigned __int32 i = 0; i < maxnum; i+=2){
        if ((i ^ i * 2) == i * 3){
            out << std::bitset<maxlen>(i) << "\n"; // dont use std::endl!
        }
    }

    out.close();

    clock_t end = clock();
    double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
    cout << elapsed_secs << endl;

    return 0;
}

It takes 0.08 seconds(!) for maxlen = 24 (/Ox).

An implementation of shx2's algorithm in C++ is non-trivial, because a recursive approach would lead to stack overflow (ha ha), and there's no yield. See:

But if you want raw speed, then there's no way around it.

Community
  • 1
  • 1
CodeManX
  • 11,159
  • 5
  • 49
  • 70
  • A recursive approach would most likely *not* lead to stack overflow unless you want very long outputs—the recursion depth is bounded by the length of the desired string. – icktoofay Aug 25 '15 at 01:36
  • My C++ knowledge is probably way too small, but in my attempt to implement shx2's approach, a stack overflow occurred even for small maxlen values. Most likely a flaw caused by me I suppose. – CodeManX Aug 25 '15 at 01:54
  • I was successfully able to implement it in C like so: http://codepad.org/wPS8d2GF – icktoofay Aug 25 '15 at 02:21
  • Nice one! Doesn't print the expected results though - numbers should always end with 0. – CodeManX Aug 25 '15 at 03:12
  • Oops! Well, it’s a small matter. Only one new line is necessary to fix it: http://codepad.org/mdic3FqW – icktoofay Aug 25 '15 at 03:16
  • Now MAX_LENGTH needs to be decremented by one or it will compute too many combinations ;) Performance is awesome, str length of 45 in 12.8 seconds and reportedly 1,836,311,903 combinations: http://codepad.org/izF5olcT (I'm very proud I figured the counter pointer stuff out :D) You are ea hero @icktoofay! – CodeManX Aug 25 '15 at 03:49
1

If you were doing it in Python 3, you would adapt it as follows:

  • Remove the container parameter.
  • Change all occurrences of container.append(some_value) to yield some_value.
  • Prepend yield from to all recursive calls.

To do it in Python 2, you do the same, except that Python 2 doesn’t support yield from. Rather than doing yield from iterable, you’ll need to use:

for item in iterable:
    yield item

You’ll then also need to change your call site, removing the container argument and instead storing the return value of the call. You’ll also need to iterate over the result, printing the values, as print will just give you <generator object at 0x...>.

icktoofay
  • 126,289
  • 21
  • 250
  • 231
  • Generators can also be cast to sequences (list, tuple), you do not necessarily need to iterate over them. Beware though if the generator contains an infinite loop. By using iteration, it's possible to abort after a certain amount of elements (or time). – CodeManX Aug 24 '15 at 20:13
  • @CoDEmanX: If it failed because of the memory usage before, turning it into a list reintroduces the memory exhaustion. – icktoofay Aug 25 '15 at 02:21