4

I would like to generate a random text using letter frequencies from a book in a .txt file, so that each new character (string.lowercase + ' ') depends on the previous one.

How do I use Markov chains to do so? Or is it simpler to use 27 arrays with conditional frequencies for each letter?

juliomalegria
  • 24,229
  • 14
  • 73
  • 89
Julia
  • 1,369
  • 4
  • 18
  • 38
  • Random text or random words? If it's just random text, you don't need to use markov chains. – jeffknupp Dec 28 '11 at 19:02
  • @jknupp it's just random letters and whitespace, not words. how do i do it without markov chains? – Julia Dec 28 '11 at 19:13
  • If you don't care if the letter frequencies are the same, you could just generate a random character using a random number generator whose range covers the encoding type you're interested in. If you need the frequencies to be the same, calculating the letter frequencies based on the previous letter would be the most straightforward way. – jeffknupp Dec 28 '11 at 19:16
  • @jknupp i've already done a simple random text generation, i want to use frequencies based on the previous letter. Do you know how i can get these conditional frequencies from my original file and how to implement them to generate a random text? thanks! – Julia Dec 28 '11 at 19:21

2 Answers2

8

I would like to generate a random text using letter frequencies from a book in a txt file

Consider using collections.Counter to build-up the frequencies when looping over the text file two letters at a time.

How do I use markov chains to do so? Or is it simpler to use 27 arrays with conditional frequencies for each letter?

The two statements are equivalent. The Markov chain is what you're doing. The 27 arrays with conditional frequencies is how you're doing it.

Here is some dictionary based code to get you started:

from collections import defaultdict, Counter
from itertools import ifilter
from random import choice, randrange

def pairwise(iterable):
    it = iter(iterable)
    last = next(it)
    for curr in it:
        yield last, curr
        last = curr

valid = set('abcdefghijklmnopqrstuvwxyz ')

def valid_pair((last, curr)):
    return last in valid and curr in valid

def make_markov(text):
    markov = defaultdict(Counter)
    lowercased = (c.lower() for c in text)
    for p, q in ifilter(valid_pair, pairwise(lowercased)):
        markov[p][q] += 1
    return markov

def genrandom(model, n):
    curr = choice(list(model))
    for i in xrange(n):
        yield curr
        if curr not in model:   # handle case where there is no known successor
            curr = choice(list(model))
        d = model[curr]
        target = randrange(sum(d.values()))
        cumulative = 0
        for curr, cnt in d.items():
            cumulative += cnt
            if cumulative > target:
                break

model = make_markov('The qui_.ck brown fox')
print ''.join(genrandom(model, 20))
Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
  • 1
    I edited you answer in order not to require scrolling the text – joaquin Dec 28 '11 at 19:21
  • another nice recipe, it looks the function genrandom do not need the statement "last = curr". – sunqiang Dec 29 '11 at 01:38
  • @RaymondHettinger thank you so much for your help, though i am confused about what `d = model[curr]` and the following bloc of code does. Would you mind explaining? Thank you! – Julia Dec 31 '11 at 09:39
1

If each character only depends on the previous character, you could just compute the probabilities for all 27^2 pairs of characters.

tlehman
  • 5,125
  • 2
  • 33
  • 51