Simulating a Markov Chain with Neo4J

Question

A Markov chain is composed of a set of states which can transition to other states with a certain probability.

A Markov chain can be easily represented in Neo4J by creating a node for each state, a relationship for each transition, and then annotating the transition relationships with the appropriate probability.

BUT, can you simulate the Markov chain using Neo4J? For instance, can Neo4J be coerced to start in a certain state and then make transitions to the next state and the next state based upon probabilities? Can Neo4J return with a printout of the path that it took through this state space?

Perhaps this is easier to understand with a simple example. Let's say I want to make a 2-gram model of English based upon the text of my company's tech blog. I spin up a script which does the following:

It pulls down the text of the blog.
It iterates over every pair of adjacent letters and creates a node in Neo4J.
It iterates again over every 3-tuple of adjacent letters and then creates a Neo4J directed relationship between the node represented by the first two letters and the node represented by the last two letters. It initializes a counter on this relationship to 1. If the relationship already exists, then the counter is incremented.
Finally, it iterates through each node, counts how many total outgoing transitions have occurred, and then creates a new annotation on each relationship of a particular node equal to count/totalcount. This is the transition probability.

Now that the Neo4J graph is complete, how do I make it create a "sentence" from my 2-gram model of English? Here is what the output might look like:

IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONSTURES OF THE REPTAGIN IS REGOACTIONA OF CRE.

Extra credit if you know there my "sample sentence" came from. Famous paper. — JnBrymn, May 17 '13 at 04:10
I'm told that if we expand this to a 5-gram model of English, then we get sentences that are indistinguishable from my Twitter posts. — JnBrymn, May 17 '13 at 04:12

score 7 · Accepted Answer · edited Jul 25 '20 at 11:00

Neo4j doesn't provide the functionality you're asking for out of the box, but since you've already come as far as correctly populating your database, the traversal that you need is just a few lines of code.

I've recreated your experiment here, with a few modifications. First of all, I populate the database with a single pass through the text (steps 2 and 3), but that's a minor. More importantly, I only store the number of occurrences on each relationship and the total number on the node (step 4), as I don't think there is a need to pre-calculate probabilities.

The code that you're asking for then looks something like this:

/**
 * A component that creates a random sentence by a random walk on a Markov Chain stored in Neo4j, produced by
 * {@link NGramDatabasePopulator}.
 */
public class RandomSentenceCreator {

    private final Random random = new Random(System.currentTimeMillis());

    /**
     * Create a random sentence from the underlying n-gram model. Starts at a random node an follows random outgoing
     * relationships of type {@link Constants#REL} with a probability proportional to that transition occurrence in the
     * text that was processed to form the model. This happens until the desired length is achieved. In case a node with
     * no outgoing relationships it reached, the walk is re-started from a random node.
     *
     * @param database storing the n-gram model.
     * @param length   desired number of characters in the random sentence.
     * @return random sentence.
     */
    public String createRandomSentence(GraphDatabaseService database, int length) {
        Node startNode = randomNode(database);
        return walk(startNode, length, 0);
    }

    private String walk(Node startNode, int maxLength, int currentLength) {
        if (currentLength >= maxLength) {
            return (String) startNode.getProperty(NAME);
        }

        int totalRelationships = (int) startNode.getProperty(TOTAL, 0);
        if (totalRelationships == 0) {
            //terminal node, restart from random
            return walk(randomNode(startNode.getGraphDatabase()), maxLength, currentLength);
        }

        int choice = random.nextInt(totalRelationships) + 1;
        int total = 0;
        Iterator<Relationship> relationshipIterator = startNode.getRelationships(OUTGOING, REL).iterator();

        Relationship toFollow = null;
        while (total < choice && relationshipIterator.hasNext()) {
            toFollow = relationshipIterator.next();
            total += (int) toFollow.getProperty(PROBABILITY);
        }

        Node nextNode;
        if (toFollow == null) {
            //no relationship to follow => stay on the same node and try again
            nextNode = startNode;
        } else {
            nextNode = toFollow.getEndNode();
        }

        return ((String) nextNode.getProperty(NAME)).substring(0, 1) + walk(nextNode, maxLength, currentLength + 1);
    }

    private Node randomNode(GraphDatabaseService database) {
        return random(GlobalGraphOperations.at(database).getAllNodes());
    }
}

Simulating a Markov Chain with Neo4J

1 Answers1