Markov Clustering in Python

Question

As the title says, I'm trying to get a Markov Clustering Algorithm to work in Python, namely Python 3.7

Unfortunately, it's not doing much of anything, and it's driving me up the wall trying to fix it.

EDIT: First, I've made the adjustments to the main code to make each column sum to 100, even if it's not perfectly balanced. I'm going to try to account for that in the final answer.

To be clear, the biggest problem is that the numbers spiral out of control, into such easily-understandable numbers as 5.56268465e-309, and I don't know how to convert that into something understandable.

Here's the code so far:

import numpy as np
import math
## How far you'd like your random-walkers to go (bigger number -> more walking)
EXPANSION_POWER = 2
## How tightly clustered you'd like your final picture to be (bigger number -> more clusters)
INFLATION_POWER = 2
ITERATION_COUNT = 10
def normalize(matrix):
    return matrix/np.sum(matrix, axis=0)

def expand(matrix, power):
    return np.linalg.matrix_power(matrix, power)

def inflate(matrix, power):
    for entry in np.nditer(transition_matrix, op_flags=['readwrite']):
        entry[...] = math.pow(entry, power)
    return matrix

def run(matrix):
    #np.fill_diagonal(matrix, 1)
    #print(matrix)
    matrix = normalize(matrix)
    print(matrix)
    for _ in range(ITERATION_COUNT):
        matrix = normalize(inflate(expand(matrix, EXPANSION_POWER), INFLATION_POWER))
    return matrix

transition_matrix = np.array ([[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
                                [0.5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
                                [0.5,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
                                [0,0,0.34,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
                                [0,0,0.33,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
                                [0,0,0.33,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
                                [0,0,0,0.34,0,0,0,0,0,0,0,0,0,0,0,0,0.125,0],
                                [0,0,0,0.33,0,0,0.5,0,0,0,0,0,0,0,0,0,0.125,1],
                                [0,0,0,0.33,0,0,0.5,1,1,0,0,0,0,0,0,0,0.125,0],
                                [0,0,0,0,0.166,0,0,0,0,0,0,0,0,0,0,0,0.125,0],
                                [0,0,0,0,0.166,0,0,0,0,0.2,0,0,0,0,0,0,0.125,0],
                                [0,0,0,0,0.167,0,0,0,0,0.2,0.25,0,0,0,0,0,0.125,0],
                                [0,0,0,0,0.167,0,0,0,0,0.2,0.25,0.5,0,0,0,0,0,0],
                                [0,0,0,0,0.167,0,0,0,0,0.2,0.25,0.5,0,1,0,0,0.125,0],
                                [0,0,0,0,0.167,0,0,0,0,0.2,0.25,0,1,0,1,0,0.125,0],
                                [0,0,0,0,0,0.34,0,0,0,0,0,0,0,0,0,0,0,0],
                                [0,0,0,0,0,0.33,0,0,0,0,0,0,0,0,0,0.5,0,0],
                                [0,0,0,0,0,0.33,0,0,0,0,0,0,0,0,0,0.5,0,0]])
run(transition_matrix)
print(transition_matrix)

This is part of a uni assignment - I need to do this array both weighted and unweighted (though the weighted part can just wait until I've got the bloody thing working at all) any tips or suggestions?

The second last column of your matrix, `transition_matrix[:, -2]`, sums to 0.88, not 1. Is there a typo somewhere? — lightalchemist, Oct 19 '18 at 05:24

score 3 · Answer 1 · answered Oct 19 '18 at 05:30

Your transition matrix is not valid.

>>> transition_matrix.sum(axis=0)
>>> matrix([[1.  , 1.  , 0.99, 0.99, 0.96, 0.99, 1.  , 1.  , 0.  , 1.  ,
         1.  , 1.  , 1.  , 0.  , 0.  , 1.  , 0.88, 1.  ]])

Not only does some of your columns not sum to 1, some of them sum to 0.

This means when you try to normalize your matrix, you will end up with nan because you are dividing by 0.

Lastly, is there a reason why you are using a Numpy matrix instead of just a Numpy array, which is the recommended container for such data? Because using Numpy arrays will simplify some of the operations, such as raising each entry to a power. Also, there are some differences between Numpy matrix and Numpy array which can result in subtle bugs.

Good catch on both points! I've made adjustments to account for that. I'm using a numpy matrix because this is altered from a base code template, and I'm not sure what turning it into an array would do. — Leafsw0rd, Oct 19 '18 at 05:46

Markov Clustering in Python

1 Answers1