2

I have this problem for a calculation I'm doing on an observed network.

Let's imagine a random graph G(N,p) where N is the number of nodes and p is the probability of an edge being formed between any node ni and nj. The graph is undirected.

Let's then mark an amount of x nodes, say 5, as special. What is then the probability (ps) of a node to have an edge to any of these special nodes.

I have disturbingly few ideas how to go about this myself. I suppose that the answer will be in two steps:

Firstly, because I imagine that I will have to conceder all possible graphs of N nodes to make events for my probability calculation. I think there might be S(S-1)/2 possible graphs if S=N(N-1)/2, but these are not equally likely, so I'm at a loss. Secondly I understand that the probability of links to special nodes must approach 1 as the number of special nodes (x) approach N, and that ps=p if x=1.

Thankful for any hints. Thanks

John Coleman
  • 51,337
  • 7
  • 54
  • 119
nJGL
  • 819
  • 5
  • 17

1 Answers1

4

For a non-special node, there are x potential edges from that node to a special node. For any such potential edge the probability of the edge not being in the graph is 1-p. Assuming independence, the probability that it avoids all special nodes is (1-p)^x. The complementary probability is what you seek, it is

1 - (1-p)^x

For special nodes, the probability that a given special node is connected to one of the other special nodes is

1 - (1-p)^(x-1)

You can combine these answers in various ways. Pick a node at random. The probability that it is either special or has an edge connecting it to a special node is:

x/N + (N-x)/N * [1-(1-p)^x]

the probability that it has an edge connecting to a special node is:

x/N * [1 - (1-p)^(x-1)] + (N-x)/N * [1 - (1-p)^x]

in all cases -- these tend to 1 as x tends to N.

Since this is Stack Overflow, a bit of programming is in order. Here is a Python 3 Monte Carlo simulation that seems to suggest the accuracy of the formula for the probability that a randomly selected node is either special or adjacent to a special:

import random

#The following function returns a random graph on nodes
#0,1,...,N-1 where edges are chosen with probability p
#The graph is returned as a dictionary keyed by the 
#The corresponding values are sorted lists of adjacent nodes

def randGraph(N,p):

    #initialize G:
    G = {}
    for i in range(N):
        G[i] = []

    #iterate through potential edges:
    for i in range(N-1):
        for j in range(i+1,N):
            if random.random() < p:
                G[i].append(j)
                G[j].append(i)

    #sort adjacency lists before returning:
    for i in G:
        G[i].sort()
    return G

#A function to determine the number of nodes
#in a graph that are either
#special or connected to a special,
#where special means: 0,1,2,...,x

def specialsAndFriends(G,x):
    count = 0
    for i in G:
        if (i<x) or (len(G[i]) > 0 and G[i][0] < x):
            count +=1
    return count

#now -- a Monte Carlo simulation:

def estimateProb(N,p,x,trials = 1000):
    estimates = []
    for i in range(trials):
        G = randGraph(N,p)
        estimates.append(specialsAndFriends(G,x)/N)
    return sum(estimates)/trials

#compare with:

def exactProb(N,p,x):
    return x/N + (N-x)/N * (1 - (1-p)**x)

(Python 2 would need to adjust e.g. x/N to float(x)/N).

Sample output:

>>> estimateProb(100,0.25,10)
0.9496800000000086
>>> 
>>> exactProb(100,0.25,10)
0.9493178367614746
John Coleman
  • 51,337
  • 7
  • 54
  • 119
  • Absolutely excellent answer. Those are the probabilities I'm looking for, and indeed they can be confirmed with Montecarlo simulations as you've done. For those, who like me, don't speak Python so well, I wrote a similar test in R, which is [here](http://lilljegren.com/stackoverflow/random_graphs.R), and simulates the probability of a special node to have links to another special node. The code shows this convergence: ![Evaluation of estimations](http://lilljegren.com/stackoverflow/montecarlo_confirmation.png) – nJGL Apr 18 '16 at 19:23
  • Cool. I've been trying to learn R recently and looking at your code should be a good learning experience. It is always pleasing when simulations and theory agrees. What is your Ph.D. research in? (you mentioned being a Ph.D. candidate in your profile.) – John Coleman Apr 18 '16 at 20:02
  • Business history, but there's a fair share of network analysis in it too. My project is about various forms of collaboration between Swedish property-underwriters during industrialisation. The R-code I posted is not very neat, though. It uses [igraph](http://igraph.org/redirect.html), however, and with that package one can make a whole lot of fun explorations of graph theory. Apparently there is a Python-package of igraph too. Thank's again. – nJGL Apr 18 '16 at 21:03
  • 1
    Voilà. The [R-script](http://www.lilljegren.com/stackoverflow/random_graphs.R) is now extensively commented... – nJGL Apr 18 '16 at 21:43