Keeping scale free graph's degree distribution after perturbation - python

Question

Given a scale free graph G ( a graph whose degree distribution is a power law), and the following procedure:

for i in range(C):
    coint = randint(0,1)
    if (coint == 0):
        delete_random_edges(G)
    else:
        add_random_edge(G)

(C is a constant)
So, when C is large, the degree distribution after the procedure would be more like G(n,p). I am interested in preserving the power law distribution, i.e. - I want the graph to be scale free after this procedure, even for large C.

My idea is writing the procedures "delete_random_edges" and "add_random_edge" in a way that will give edges that connected to node with big degree small probability to be deleted (when adding new edge, it would be more likely to add it to node with large degree).

I use Networkx to represent the graph, and all I found is procedures that delete or add a specific edge. Any idea how can I implement the above?

lrnzcig · Answer 1 · 2015-10-18T07:11:12.370

Although you have already accepted the answer from @abdallah-sobehy, meaning that it works, I would suggest a more simple approach, in case it helps you or anybody around.

What you are trying to do is sometimes called preferential attachment (well, at least when you add nodes) and for that there is a random model developed quite some time ago, see Barabasi-Albert model, which leads to a power law distribution of gamma equals -3.

Basically you have to add edges with probability equal to the degree of the node divided by the sum of the degrees of all the nodes. You can scipy.stats for defining the probability distribution with a code like this,

import scipy.stats as stats
x = Gx.nodes()
sum_degrees = sum(list(Gx.degree(Gx).values()))
p = [Gx.degree(x)/sum_degrees for x in Gx]
custm = stats.rv_discrete(name='custm', values=(x, p))

Then you just pick 2 nodes following that distribution, and that's the 2 nodes you add an edge to,

custm.rvs(size=2)

As for deleting the nodes, I haven't tried that myself. But I guess you could use something like this,

sum_inv_degrees = sum([1/ x for x in list(Gx.degree(Gx).values())])
p = [1 / (Gx.degree(x) * sum_inv_degrees) for x in Gx]

although honestly I am not completely sure; it is not anymore the random model that I link to above...

Hope it helps anyway.

UPDATE after comments

Indeed by using this method for adding nodes to an existing graph, you could get 2 undesired outcomes:

duplicated links
self links

You could remove those, although it will make the results deviate from the expected distribution.

Anyhow, you should take into account that you are deviating already from the preferential attachment model, since the algorithm studied by Barabasi-Albert works adding new nodes and links to the existing graph,

The network begins with an initial connected network of m_0 nodes. New nodes are added to the network one at a time. Each new node is connected to m > m_0 existing nodes with a probability that is proportional to the number ...

(see here)

If you want to get an exact distribution (instead of growing an existing network and keeping its properties), you're probably better off with the answer from @joel

Hope it helps.

In this method, you probably keep adding edges that are already exist in the graph. It's possible to run it several times until a new edge appears — ONE1234, Oct 17 '15 at 20:55
It's true. You could end up adding a self-link as well. Depending on your context, you can remove duplicate and self-links -the distribution will not look as much as you need, but it will probably be closed enough. The preferential attachment model does not actually work adding links to the existing graph, but adding new nodes... I'm updating the answer to add a bit more details. Hope it helps. — lrnzcig, Oct 18 '15 at 07:00

Joel · Accepted Answer · 2015-10-17T22:03:10.550

Here's 2 algorithms:

Algorithm 1

This algorithm does not preserve the degree exactly, rather it preserves the expected degree.

Save each node's initial degree. Then delete edges at random. Whenever you create an edge, do so by randomly choosing two nodes, each with probability proportional to the initial degree of those nodes.

After a long period of time, the expected degree of each node 'u' is its initial degree (but it might be a bit higher or lower).

Basically, this will create what is called a Chung-Lu random graph. Networkx has a built in algorithm for creating them.

Note - this will allow the degree distribution to vary.

algorithm 1a

Here is the efficient networkx implementation skipping over the degree deleting and adding and going straight to the final result (assuming a networkx graph G):

degree_list = G.degree().values()
H = nx.expected_degree_graph(degree_list)

Here's the documentation

Algorithm 2

This algorithm preserves the degrees exactly.

Choose a set of edges and break them. Create a list, with each node appearing equal to the number of broken edges it was in. Shuffle this list. Create new edges between nodes that appear next to each other in this list.

Check to make sure you never join a node to itself or to a node which is already a neighbor. If this would occur you'll want to think of a custom way to avoid it. One option is to simply reshuffle the list. Another is to set those nodes aside and include them in the list you create next time you do this.

edit There is a built in networkx command double_edge_swapto swap two edges at a time. documentation

Rereading this, I realized I didn't give the quickest way through algorithm 1, and I hadn't mentioned a closely related networkx command for algorithm 2. I've added them. — Joel, Oct 17 '15 at 22:03

Abdallah Sobehy · Answer 3 · 2015-10-14T00:14:15.057

I am not sure to what extent this will preserve the scale free property but this can be a way to implement your idea:

In order to add an edge you need to specify 2 nodes in networkx. so, you can choose one node with a probability that is proportional to (degree) and the other node to be uniformly chosen (without any preferences). Choosing a highly connected node can be achieved as follows:

For a graph G where nodes are [0,1,2,...,n]

1) create a list of floats (limits) between 0 and 1 to specify for each node a probability to be chosen according to its degree^2. For example: limits[1] - limits[0] is the probability to choose node 0, limits[2] - limits[1] is probability to choose node 2 etc.

# limits is a list that stores floats between 0 and 1 which defines
# the probabaility of choosing a certain node depending on its degree
limits = [0.0]
# store total number of edges of the graph; the summation of all degrees is 2*num_edges
num_edges = G.number_of_edges()
# store the degree of all nodes in a list
degrees = G.degree()
# iterate nodes to calculate limits depending on degree
for i in G:
    limits.append(G.degree(i)/(2*num_edges) + limits[i])

2) Randomly generate a number between 0 and 1 then compare it to the limits, and choose the node to add an edge to accordingly:

rnd = np.random.random()
# compare the random number to the limits and choose node accordingly
for j in range(len(limits) - 1):
    if rnd >= limits[j] and rnd < limits[j+1]:
         chosen_node = G.node[j]

3) Choose another node uniformly, by generating a random integer between [0,n]

4) Add an edge between both of the chosen nodes.

5) Similarly for deleting edge, you can choose a node according to (1/degree) instead of degree then uniformly delete any of its edges.

It is interesting to know if using this approach would reserve the scale free property and at which 'C' the property is lost , so let us know if it worked or not.

EDIT: As suggested by @joel the selection of the node to add an edge to should be proportional to degree rather than degree^2. I have edited step 1 accordingly.

EDIT2: This might help you to be able to judge if the scale free graph lost its property after edges addition and removal. Simply comute the preferential attachmet score before and after changes. You can find the doumentation here.

I think rather than selecting proportional to degree^2, the selection should be proportional to degree. — Joel, Oct 11 '15 at 20:15
Yes I agree, you are right @Joel . At the beginning I just thought of choosing a highly connected node but looking into the matter of the Barabasi-Albert model for example I realize choosing according to the degree is better. — Abdallah Sobehy, Oct 11 '15 at 22:00
@ONE1234 I have edited the answer as per the suggestion of Joel so have another look. — Abdallah Sobehy, Oct 11 '15 at 22:02
Check EDIT2 in the answer, it might help you to tell if the scale free property was lost or not. @ONE1234 — Abdallah Sobehy, Oct 14 '15 at 00:15

Keeping scale free graph's degree distribution after perturbation - python

3 Answers3