0

Herewith the code for optimization:

import random

rules = {
    'X': {
        1: 'FXFF+',
        2: '+XXF]',
    }
}

L_string = 'FX'

def next_char(c):
    isrule = rules.get(c, c)
    if not isrule == c:
        _, choice = random.choice(list(rules.get(c).items()))
        return choice
    else:
        return isrule

for _ in range(6):
    L_string = ''.join([next_char(c) for c in L_string])

Whats happening here is a recursive replacement of characters in a string. So step by step:

  1. Start with 'FX'
  2. Go through string and replace each 'X' with a random rule i.e. 'FXFF+' or '+XXF]'. That is for each 'X' a rule is randomized. It's not a random rule for each run through the string.
  3. Repeat this for 5 times

In the end the result is a longer string made up of the starting 'F' and the rules 'FXFF+', '+XXF]' in some random combination. The table illustrates:

+------------+--------------------+--------------------+
| ITERATIONS |       STRING       | CHOSEN RULE VECTOR |
+------------+--------------------+--------------------+
|          1 | FFXFF+             | [rule 1]           |
|          2 | FF+XXF]FF+         | [rule 2]           |
|          3 | FF+FXFF++XXF]F]FF+ | [rule 1, rule 2]   |
|          4 | ...                | ...                |
|          5 | ...                | ...                |
+------------+--------------------+--------------------+

I've read that re.sub is the fastest for replacing strings but the problem is the randomization at each character. Re.sub won't work for that.

Thanks all!

Izak Joubert
  • 906
  • 11
  • 29

5 Answers5

2

A simple ~4x speed up on the function that consumes most of the run time.


from random import random
from math import floor

def next_char2(c):
    if c not in rules:
        return c 

    d = rules[c]
    r = floor(random() * len(d))  # was int(...) before 
    # Rules start with key 1. 
    # Random brings a float between 0 and 1, therefore you need [r + 1] as key 
    return d[r + 1]


In [6]: %timeit next_char("X")
3.42 µs ± 32.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [7]: %timeit next_char2("X")
814 ns ± 12.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Edit: Changing the int with math.floor gives a little boost

In [10]: %timeit next_char2("X")                                                                
740 ns ± 8.57 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

There might be a lot room for optimization. Maybe a memoization somewhere might give a huge boost in general for the whole code.

altunyurt
  • 2,821
  • 3
  • 38
  • 53
2

Assuming characters '{' and '}' do not occur in your patters, you could do some trickery with the template language and strip braces afterwards. This is 2.5x faster on my machine:

def format_based():
    rules = {
        'X': lambda: random.choice(["{F}{X}{F}{F}+", "+{X}{X}{F}{J}]"]),
        'F': lambda: 'F',
        'J': lambda: 'J',
    }
    def get_callbacks():
        while True:
            yield {k: v() for k, v in rules.items()}
    callbacks = get_callbacks()
    L_string = "{F}{X}"
    for _ in range(5):
        L_string = L_string.format(**next(callbacks))
    return re.sub('{|}', '', L_string)
hilberts_drinking_problem
  • 11,322
  • 3
  • 22
  • 51
1

New method with recursion, approx ~1.6x faster and another method, approx ~3.312x faster on my PC

import re
from random import random, choice
from timeit import timeit
from math import floor

# --- ORIGINAL ---
rules = {
    'X': {
        1: 'FXFF+',
        2: '+XXF]',
    }
}

def next_char(c):
    isrule = rules.get(c, c)
    if not isrule == c:
        _, _choice = choice(list(rules.get(c).items()))
        return _choice
    else:
        return isrule

# --- ORIGINAL END ---

def next_char2(c):
    if c not in rules:
        return c

    d = rules[c]
    r = floor(random() * len(d))  # was int(...) before
    # Rules start with key 1.
    # Random brings a float between 0 and 1, therefore you need [r + 1] as key
    return d[r + 1]

choices=['FXFF+', '+XXF]']
def next_substring(s, n):
    if s == '' or n == 0:
        return s

    first_char = s[:1]
    rest = s[1:]

    if first_char == 'X':
        first_char = choice(choices)

    if len(first_char) == 1:
        return first_char + (next_substring(rest, n) if 'X' in rest else rest)
    else:
        return (next_substring(first_char, n-1) if 'X' in first_char else first_char) + (next_substring(rest, n) if 'X' in rest else rest)

format_rules = {
    'X': lambda: choice(["{F}{X}{F}{F}+", "+{X}{X}{F}]"]),
    'F': lambda: 'F',
    'J': lambda: 'J',
}

def format_based():
    def get_callbacks():
        while True:
            yield {k: v() for k, v in format_rules.items()}
    callbacks = get_callbacks()
    L_string = "{F}{X}"
    for _ in range(6):
        L_string = L_string.format(**next(callbacks))
    return re.sub(r'{|}', '', L_string)


def method1():
    s = 0
    for i in range(100_000):
        L_string = 'FX'
        for _ in range(6):
            L_string = ''.join([next_char(c) for c in L_string])
        s += len(L_string)
    return s

def method1b():
    s = 0
    for i in range(100_000):
        L_string = 'FX'
        for _ in range(6):
            L_string = ''.join([next_char2(c) for c in L_string])
        s += len(L_string)
    return s


def method2():
    s = 0
    for i in range(100_000):
        L_string = 'FX'
        L_string = ''.join(next_substring(c, 6) if c=='X' else c for c in L_string)
        s += len(L_string)
    return s

def method3():
    s = 0
    for i in range(100_000):
        L_string = format_based()
        s += len(L_string)
    return s

rules2 = [
    ('FXFF+', '+XXF]')      # X=0
]

def new_method2(s='FX'):
    final = [s]
    s = ''
    for _ in range(6):
        for c in final[-1]:
            if c == 'X':
                s += rules2[0][floor(random() * len(rules2[0]))]    # rules2[0] because X=0
            else:
                s += c
        final.append(s)
        s = ''
    return final[-1]

def method4():
    s = 0
    for i in range(100_000):
        L_string = new_method2('FX')
        s += len(L_string)
    return s

print('Average length of result string (100_000 runs):')
print('{: <20}{: >20}'.format('Original:', method1() / 100_000))
print('{: <20}{: >20}'.format('New method:', method2() / 100_000 ))
print('{: <20}{: >20}'.format('@hilberts method:', method3() / 100_000 ))
print('{: <20}{: >20}'.format('new_method2 method:', method4() / 100_000 ))
print('{: <20}{: >20}'.format('altunyurt method:', method1b() / 100_000 ))

print('{: <20}{: >20}'.format('Timing original:', timeit(lambda: method1(), number=1)))
print('{: <20}{: >20}'.format('Timing new method:', timeit(lambda: method2(), number=1)))
print('{: <20}{: >20}'.format('Timing @hilberts method:', timeit(lambda: method3(), number=1)))
print('{: <20}{: >20}'.format('new_method2 method:', timeit(lambda: method4(), number=1)))
print('{: <20}{: >20}'.format('altunyurt method:', timeit(lambda: method1b(), number=1)))

The results:

Average length of result string (100_000 runs):
Original:                       85.17692
New method:                     85.29112
@hilberts method:               85.20096
new_method2 method:             84.88892
altunyurt method:               85.07668
Timing original:       4.563865200005239
Timing new method:    2.6940059370026574
Timing @hilberts method:  1.9866539289942011
new_method2 method:   1.3680451929976698
altunyurt method:     1.7981422250013566

EDIT: Added @hilberts method

EDIT2: Added another new method, ~3.32x faster than original

EDIT3: Added @altunyurt method

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

You could try with re.sub with replacement value as a function that generates random rules as per your case. You could write the string to a file in disk and set buffering when reading the file and then write to another file the substituted values and then rename the file and delete the old one. Hope this helps :).

import re
import random
rules = {
    'X': ['FXFF+','+XXF]'],
    "Y" : ['A','B']
}

L_string = 'FX'

def next_char(c):
    return random.choice(rules[c.group()])

for _ in range(6):
    L_string = re.sub('|'.join(rules.keys()),next_char,L_string)
    print(L_string)

OUTPUT

FFXFF+
FFFXFF+FF+
FFF+XXF]FF+FF+
FFF++XXF]+XXF]F]FF+FF+
FFF+++XXF]FXFF+F]+FXFF+FXFF+F]F]FF+FF+
FFF++++XXF]FXFF+F]FFXFF+FF+F]+F+XXF]FF+F+XXF]FF+F]F]FF+FF+

EDITED

I have edited to make it fast and to produce the last string with a recursive call. Pardon me for using global variables :P.

import re
import random
rules = {
    'X': ['XX','XY'],
    'Y' : ['A', 'B']
}

L_string = 'FXY'
depth = 6
result = ''
index = 0 

def go(d,currentstring) :
    global result,depth
    if (d < depth):
        for c in currentstring:
            if c in rules:
                go(d + 1,random.choice(rules[c])) 
            else:
                result += c
    else:
        result += currentstring
go(0,L_string)
print(result)
Albin Paul
  • 3,330
  • 2
  • 14
  • 30
0

This generates the output without substitutions:

import random

result = []
write = result.append

def X(level):
    if level == 0:
        write('X')
        return
    if random.randint(0,1):
        # X -> FXFF+
        write('F')
        X(level-1)
        write('FF+')
    else:
        # X -> +XXF]
        write('+')
        X(level-1)
        X(level-1)
        write('F]')

def start():
    write('F')
    X(5)  # 5 = recursion depth

start()
print(''.join(result))
VPfB
  • 14,927
  • 6
  • 41
  • 75
  • Wow! More than 50% faster. However, what if you have more than one variable say 'X' and 'Y'? – Izak Joubert Aug 04 '19 at 16:43
  • You can add another function for 'Y'. `X` may call `Y(level-1)` and vice versa. The speed is achieved by rewriting the rules from data structures to program instructions, so it is little bit less flexible. – VPfB Aug 04 '19 at 16:47