-2

I am in an intro to programming class and one of our final projects is to create a sentence generator. The requirements are that we have to take a sample input, strip it down to only lower case letters, use the Markov Model to determine the transition probabilities (a to e, e to t, etc), and store them into dictionaries. For example the dictionary for e would looks something like this:

e_trans = {'em': 0.0769, 'e ': 0.2307, 'ea': 0.3077, 'es': 0.1538, 'et': 0.0769, 'ee': 0.1538}

Then we have to create a generator that uses these probabilities to create random sentences.

I haven't gotten very far because I don't even know where to start to get the probabilities. We cannot use any of the Markov Model packages for python. Any help would be greatly appreciated.

The code I have so far is:

import random

inputFile = open("input.txt", 'r')
rawdata = inputFile.read()

rawdata = rawdata.lower()
rawdata = rawdata.replace('-',' ')
data = (' ')

for character in rawdata:
    if ord(character) == 32:
        data += character
    elif ord(character) > 96 and ord(character) < 123:
        data += character

data += ' '

print(data)

S = {}

for letter in data:
    if letter not in S:
        S += letter

print(S)


inputFile.close()

1 Answers1

0

Count the number of each kind of transition and divide them by the total number of transitions.

MRAB
  • 20,356
  • 6
  • 40
  • 33
  • That's the idea, but how do I count the number of kinds of transitions? – Tyler Linden Dec 02 '14 at 19:13
  • The simplest way is to use `zip`, i.e. `zip(data, data[1 : ])`, and `Counter` from the `collections` module (assuming Python 3). – MRAB Dec 02 '14 at 20:40