Get bad result for random walk

Question

I want to implement random walk and compute the steady state.

Suppose my graph is given as in the following image:

The graph above is defined in a file as follows:

1   2   0.9
1   3   0.1
2   1   0.8
2   2   0.1
2   4   0.1
etc

To read and build this graph, I use the following method:

def _build_og(self, original_ppi):
""" Build the original graph, without any nodes removed. """

try:
    graph_fp = open(original_ppi, 'r')
except IOError:
    sys.exit("Could not open file: {}".format(original_ppi))

G = nx.DiGraph()
edge_list = []

# parse network input
for line in graph_fp.readlines():
    split_line = line.rstrip().split('\t')
    # assume input graph is a simple edgelist with weights
    edge_list.append((split_line[0], split_line[1],  float(split_line[2])))

G.add_weighted_edges_from(edge_list)
graph_fp.close()
print edge_list

return G

In the function above do I need to define the graph as DiGraph or simpy Graph?

We build the transition matrix as following:

def _build_matrices(self, original_ppi, low_list, remove_nodes):
        """ Build column-normalized adjacency matrix for each graph.

        NOTE: these are column-normalized adjacency matrices (not nx
              graphs), used to compute each p-vector
        """
        original_graph = self._build_og(original_ppi)
        self.OG = original_graph
        og_not_normalized = nx.to_numpy_matrix(original_graph)
        self.og_matrix = self._normalize_cols(og_not_normalized)

And I normalize the matrix using :

def _normalize_cols(self, matrix):
        """ Normalize the columns of the adjacency matrix """
        return normalize(matrix, norm='l1', axis=0)

now to simulate the random walk we define :

def run_exp(self, source):

        CONV_THRESHOLD = 0.000001
        # set up the starting probability vector
        p_0 = self._set_up_p0(source)
        diff_norm = 1
        # this needs to be a deep copy, since we're reusing p_0 later
        p_t = np.copy(p_0)

        while (diff_norm > CONV_THRESHOLD):
            # first, calculate p^(t + 1) from p^(t)
            p_t_1 = self._calculate_next_p(p_t, p_0)

            # calculate L1 norm of difference between p^(t + 1) and p^(t),
            # for checking the convergence condition
            diff_norm = np.linalg.norm(np.subtract(p_t_1, p_t), 1)

            # then, set p^(t) = p^(t + 1), and loop again if necessary
            # no deep copy necessary here, we're just renaming p
            p_t = p_t_1

We define the initial state (p_0) by using the following method:

def _set_up_p0(self, source):
    """ Set up and return the 0th probability vector. """
    p_0 = [0] * self.OG.number_of_nodes()
    # convert self.OG.number_of_nodes() to list
    l =  list(self.OG.nodes())
    #nx.draw(self.OG, with_labels=True)
    #plt.show()
    for source_id in source:
        try:
            # matrix columns are in the same order as nodes in original nx
            # graph, so we can get the index of the source node from the OG
            source_index = l.index(source_id)
            p_0[source_index] = 1 / float(len(source))
        except ValueError:
            sys.exit("Source node {} is not in original graph. Source: {}. Exiting.".format(source_id, source))

    return np.array(p_0)

To generate the next state, we use the following function

and the power iteration strategy:

def _calculate_next_p(self, p_t, p_0):
        """ Calculate the next probability vector. """
        print 'p_0\t{}'.format(p_0)
        print 'p_t\t{}'.format(p_t)
        epsilon = np.squeeze(np.asarray(np.dot(self.og_matrix, p_t)))
        print 'epsilon\t{}'.format(epsilon)
        print 10*"*"
        return np.array(epsilon)

Suppose the random walk can start from any node (1, 2, 3 or 4).

When runing the code i get the following result:

2       0.32
3       0.31
1       0.25
4       0.11

The result must be:

(0.28, 0.30, 0.04, 0.38).

So can someone help me to detect where my mistake is?

I don't know if the problem is in my transition matrix.

1. There's a ton of code in your question. Can you simplify things so we understand better what's going on? Copy/pasting from your code is easier, but it makes it much more difficult for us to understand 2. Not clear what is "different". Can you plot or explain how the result is different and what is the expected output? — cd98, Sep 22 '18 at 21:09
@cd98 I want to compute the steady distribution vector for a given directed graph. This vector must be equal to [0.35, 0.56, 0.07]. But in my result i get [0.41, 0.35, 0.24]. — bib, Sep 22 '18 at 22:03

score 1 · Accepted Answer · answered Sep 24 '18 at 01:06

Here is what the matrix should be (given than your transition matrix multiplies the state vector from the left, it is a left stochastic matrix, where the columns add up to 1, and the (i, j) entry is the probability of going from j to i).

import numpy as np
transition = np.array([[0, 0.8, 0, 0.1], [0.9, 0.1, 0.5, 0], [0.1, 0, 0.3, 0], [0, 0.1, 0.2, 0.9]])
state = np.array([1, 0, 0, 0])    # could be any other initial position
diff = tol = 0.001
while diff >= tol:
    next_state = transition.dot(state)
    diff = np.linalg.norm(next_state - state, ord=np.inf)
    state = next_state
print(np.around(state, 3))

This prints [0.279 0.302 0.04 0.378].

I can't tell if you are loading the data incorrectly, or something else. The step with "column normalization" is a warning sign: if the given transition probabilities don't add up to 1, you should report bad data, not normalize the columns. And I don't know why you use NetworkX at all when the data is already presented as a matrix: the table you are given can be read as

column   row   entry

and this matrix is what is needed for calculations.

Thank you for your answer. I dont know how you get the transition matrix in this form. I have a .txt file containing the node and the arc between them. 1 2 0.9 means an arc with weight 0.9 between node 1 and node 2. — bib, Sep 24 '18 at 01:38
Right, so `transition[1, 0] = 0.9` (offset for 0-based indices). First row of the matrix are probabilities of going **to** node 1 from nodes 1, 2, 3, 4. Second row: probabilities of going **to** node 2 from nodes 1, 2, 3, 4. Etc. — , Sep 24 '18 at 01:54

Get bad result for random walk

1 Answers1