4

Calculation of Average clustering coefficient of a graph I am getting correct result but it takes huge time when the graph dimension increases need some alternative way so that it takes less time to execute. Is there any way to simplify the code??

    %// A is adjacency matrix N X N, 
    %// d is degree ,

    N=100;
    d=10;
    rand('state',0)
    A = zeros(N,N);
    kv=d*(d-1)/2; 

%% Creating A matrix %%%

for i = 1:(d*N/2)
    j = floor(N*rand)+1;
    k = floor(N*rand)+1;
    while (j==k)||(A(j,k)==1)
        j = floor(N*rand)+1;
        k = floor(N*rand)+1;
    end
    A(j,k)=1;
    A(k,j)=1;
end 

%%   Calculation of clustering Coeff %%

    for i=1:N   
        J=find(A(i,:));  
        et=0;
        for ii=1:(size(J,2))-1
            for jj=ii+1:size(J,2)            
                et=et+A(J(ii),J(jj));
            end
        end
        Cv(i)=et/kv;
    end
    Avg_clustering_coeff=sum(Cv)/n;

Output I got.

Avg_clustering_coeff = 0.1107

Divakar
  • 218,885
  • 19
  • 262
  • 358
RRK
  • 43
  • 1
  • 5
  • 1
    the values taken is N=100 & d=10 – RRK Apr 19 '16 at 08:05
  • 1
    What is `A` then? Could you please add some sample values to the question (***not*** to the comments) and add the expected output, please. People want to copy your code straight off and compare their solution to your output. (i.e. an [mcve](https://stackoverflow.com/help/mcve)) – kkuilla Apr 19 '16 at 08:15
  • 1
    I'm getting `Avg_clustering_coeff =0.10` every time when I run your code. Is that expected? Also, the small `n` in your `for i=1:n ` is undefined. – kkuilla Apr 19 '16 at 10:02
  • that small n in i=1:n is actually N. As it is based on random number so might result deviate a little. – RRK Apr 19 '16 at 10:19
  • 1
    As your algorithm work line by line, and that the non zero value on each line are randomly distributed, i think that it's going to be very hard to truly improve your calculation time. But two things, if you keep your j and k value you won't need to find again the non zero value in A. And you can try to sparse your A matrix as you have a lot of value. – obchardon Apr 19 '16 at 15:59
  • Ho and you create a symetric matrix (A) but then you only access to the upper part of your symetrical matrix. so this line is useless: A(k,j)=1; – obchardon Apr 19 '16 at 16:06

2 Answers2

2

That Calculation of clustering Coeff part could be vectorized using nchoosek to remove the innermost two nested loops, like so -

CvOut = zeros(1,N);
for k=1:N
    J=find(A(k,:));
    if numel(J)>1
        idx = nchoosek(J,2);
        CvOut(k) = sum(A(sub2ind([N N],idx(:,1),idx(:,2))));
    end
end
CvOut=CvOut/kv;

Hopefully, this would boost up the performance quite a bit!

Divakar
  • 218,885
  • 19
  • 262
  • 358
  • You forgot `mean(CvOut);` at the end. Is it not possible to go down the nay-loop permute bsxfun route? – kkuilla Apr 21 '16 at 10:16
  • @kkuilla Well,l because each row would give varying number of elements in `J`, it might not be straight-forward. Even if it's possible, we are looking at huge memory requirements there :) – Divakar Apr 21 '16 at 10:18
  • @kkuilla On a separate note, if `A` consists of ones and zeros only, one can think of other ways of optimizing it. Can't muster a way with such a condition. – Divakar Apr 21 '16 at 10:22
  • @Divakar @ kkuilla Thanks for the providing a solution. reducing the nested loop has improved the execution time. I think vectorizing is the best way. – RRK Apr 26 '16 at 07:41
  • @RRK Glad to hear that! – Divakar Apr 26 '16 at 07:45
1

To speed up your code you can read my comment, but you are not going to reduce drastically the computation time, because the time complexity doesn't change.

But if you don't need to get an absolut result you can use the probability.

probnum  = cumsum(1:d);
probnum  = mean(probnum(end-1:end)); %theorical number of elements created by your second loop (for each row).
probfind = d*N/(N^2); %probability of finding a non zero value.
coeff    = probnum*probfind/kv;

This probabilistic coeff is going to be equal to Avg_clustering_coeff for big N.

So you can use the normal method for small N and this method for big N.

obchardon
  • 10,614
  • 1
  • 17
  • 33
  • I don't think this is correct. The `Avg_clustering_coeff = 0.100000` for the original code and your `coeff = 0.11111` when I run it for the same random values. – kkuilla Apr 20 '16 at 07:56
  • @kkuilla ok i'm going to check tonight if there is a mistake. Perhaps that probnum should only be: probnum(end-1) – obchardon Apr 20 '16 at 11:12