Calculating adjacency matrix from randomly generated graphs

Question

I have developed small program, which randomly generates several connections between the graphs (the value of the count could be randomly too, but for the test aim I have defined const value, it could be redefined in random value in any time).

Code is C#: http://ideone.com/FDCtT0

( result: Success time: 0.04s memory: 36968 kB returned value: 0 )

If you don't know, what is the adjacency matrix, go here : http://en.wikipedia.org/wiki/Adjacency_matrix

enter image description here

I think, that my version of code is rather not-optimized. If I shall work with large matrixes, which have the size: 10k x 10k.

What are your suggestions, how is better to parallel calculations in this task? Should I use some of the lockers-models like semaphore etc for multi-threading calculations on large matrixes.
What are your suggestions for redesigning the architecture of program. How should I prepare it for large matrixes?
As you see, upper at ideone, I have showed the time execution parameter and allocated memory in RAM. What is the asymptotic value of execution of my program? Is it O(n^2)?

So I want to listen to your advice how to increase the asymptotic mark, parallel calculations with using semaphores ( or maybe better locker-model for threads ).

Thank you!

PS: SO doesn't allow to post topic without formatted code, so I'm posting in at the end (full program):

/*
    Oleg Orlov, 2012(c), generating randomly adjacency matrix and graph connections
*/

using System;
using System.Collections.Generic;

class Graph
{
    internal int id;
    private int value;
    internal Graph[] links;

    public Graph(int inc_id, int inc_value)
    {
        this.id = inc_id;
        this.value = inc_value;
        links = new Graph[Program.random_generator.Next(0, 4)];
    }
}

class Program
{
    private const int graphs_count = 10;
    private static List<Graph> list;
    public static Random random_generator;

    private static void Init()
    {
        random_generator = new Random();
        list = new List<Graph>(graphs_count);

        for (int i = 0; i < list.Capacity; i++)
        {
            list.Add(new Graph(i, random_generator.Next(100, 255) * i + random_generator.Next(0, 32)));
        }
    }

    private static void InitGraphs()
    {
        for (int i = 0; i < list.Count; i++)
        {
            Graph graph = list[i] as Graph;
            graph.links = new Graph[random_generator.Next(1, 4)];

            for (int j = 0; j < graph.links.Length; j++)
            {
                graph.links[j] = list[random_generator.Next(0, 10)];
            }

            list[i] = graph;
        }
    }

    private static bool[,] ParseAdjectiveMatrix()
    {
        bool[,] matrix = new bool[list.Count, list.Count];

        foreach (Graph graph in list)
        {
            int[] links = new int[graph.links.Length];

            for (int i = 0; i < links.Length; i++)
            {
                links[i] = graph.links[i].id;
                matrix[graph.id, links[i]] = matrix[links[i], graph.id] = true;
            }
        }

        return matrix;
    }

    private static void PrintMatrix(ref bool[,] matrix)
    {
        for (int i = 0; i < list.Count; i++)
        {
            Console.Write("{0} | [ ", i);

            for (int j = 0; j < list.Count; j++)
            {
                Console.Write(" {0},", Convert.ToInt32(matrix[i, j]));
            }

            Console.Write(" ]\r\n");
        }

        Console.Write("{0}", new string(' ', 7));

        for (int i = 0; i < list.Count; i++)
        {
            Console.Write("---");
        }

        Console.Write("\r\n{0}", new string(' ', 7));

        for (int i = 0; i < list.Count; i++)
        {
            Console.Write("{0}  ", i);
        }

        Console.Write("\r\n");
    }

    private static void PrintGraphs()
    {
        foreach (Graph graph in list)
        {
            Console.Write("\r\nGraph id: {0}. It references to the graphs: ", graph.id);

            for (int i = 0; i < graph.links.Length; i++)
            {
                Console.Write(" {0}", graph.links[i].id);
            }
        }
    }

    [STAThread]
    static void Main()
    {
        try
        {
            Init();
            InitGraphs();
            bool[,] matrix = ParseAdjectiveMatrix();
            PrintMatrix(ref matrix);
            PrintGraphs();
        }
        catch (Exception exc)
        {
            Console.WriteLine(exc.Message);
        }

        Console.Write("\r\n\r\nPress enter to exit this program...");
        Console.ReadLine();
    }
}

Because your individual tasks are very quick, the overhead involved in parallelising the work might make it unworthwhile. http://msdn.microsoft.com/en-us/library/dd997392.aspx — paul, Dec 06 '12 at 22:02
@paul it may be quick on 10x10 matrix, but if you have about 1 million graphs and 3-4 millions connections between them and the size of matrix is 1mln x 1mln. So what about paralleling here? — Secret, Dec 06 '12 at 22:05
What do you want to calculate, actually? Give us an example. — dreamzor, Dec 06 '12 at 22:08
@dreamzor I have thought that what do I want to calculate is in the name of this topic :) (I want to calc large adj-matrixes from large count of connection between a lot of graphs) — Secret, Dec 06 '12 at 22:10
How are you going to store (1e9)^2 = 1e18 values in the operating memory? :) — dreamzor, Dec 06 '12 at 22:28
@dreamzor don't understand why did you say about numeral E :) ( which means as I remember values which are large than quadrillion ). 3-4 million values are impossible to keep in RAM, aren't them? — Secret, Dec 06 '12 at 22:34
1e18 stands for 10^18. It is possible to store 3-4 million, but the matrix also stores the `0` values. If you really want the matrix, you need to store it completely, which is n^2. Or tell us what do you really want to do. :) — dreamzor, Dec 06 '12 at 22:35
@dreamzor what I'm really want to do is in the name of the topic. More suggestions to make clear understanding, what do I want. Maybe just make program, which shall work with N-graphs/N-connections between graphs. Also I want to increase from O(n^2) to O(log N). Of course the RAM/disk space of PC is finite so also the one of the aims is: try to make so large calculations with adj-matrixes/graphs-connections as it could be on concretely machine. — Secret, Dec 06 '12 at 22:41
so you want to know if vertex `a` is connected to `b` as fast is possible? — dreamzor, Dec 06 '12 at 22:44
@dreamzor yes, considering on large size/big count of elements (graphs, connections, matix size) with trying to increase the speed to O(log N) or maximally. also question with fast creating the matrix at runtime :) ps 3rd point is associated with holding matrix in some compressing-algo from LZ-family :))) than in disk space for the report-file :))) — Secret, Dec 06 '12 at 23:07

dreamzor · Accepted Answer · 2012-12-06T23:03:04.307

2

I will start from the end, if you don't mind. :)

3) Of course, it is O(n^2). As well as the memory usage.

2) Since sizeof(bool) == 1 byte, not bit, you can optimize memory usage by using bit masks instead of raw bool values, this will make it (8 bits per bool)^2 = 64 times less.

1) I don't know C# that well, but as i just googled i found out that C# primitive types are atomic, which means you can safely use them in multi-threading. Then, you are to make a super easy multi-threading task: just split your graphs by threads and press the 'run' button, which will run every thread with its part of graph on itself. They are independent so that's not going to be any problem, you don't need any semaphores, locks and so on.

The thing is that you won't be able to have an adjacency matrix with size 10^9 x 10^9. You just can't store it in the memory. But, there is an other way.
Create an adjacency list for each vertex, which will have a list of all vertices it is connected with. After building those lists from your graph, sort those lists for each vertex. Then, you can answer on the 'is a connected to b' in O( log(size of adjacency list for vertex a) ) time by using binary search, which is really fast for common usage.

Now, if you want to implement Dijkstra algorithm really fast, you won't need an adj. matrix at all, just those lists.

Again, it all depends on the future tasks and constraints. You cannot store the matrix of that size, that's all. You don't need it for Dijkstra or BFS, that's a fact. :) There is no conceptual difference from the graph's side: graph will be the same no matter what data structure it's stored in.

If you really want the matrix, then that's the solution:
We know, that number of connections (1 in matrix) is greatly smaller than its maximum which is n^2. By doing those lists, we simply store the positions of 1 (it's also called sparse matrix), which consumes no unneeded memory.

edited Dec 06 '12 at 23:03

answered Dec 06 '12 at 22:23

dreamzor

5,795
4
41
61

new questions to you: 1). Is it possible to increase to O(log N), maybe if try to keep graphs in some self-balanced data-structure like RB-tree/B-family of trees? 2). I'm worrying about creating large matrix, for example, if the size/graph's count were about 1 million (3-4 million connections between 1 million graphs) and the matrix size: 1 million x 1 million. I'm thinking of paralling here or optimizing some iterations (decrease the count of iterations for better mark as O(log N) ). Any ideas? Thank you. – Secret Dec 06 '12 at 22:29
What do you want to do in `O(logN)`? There's nothing to depend on sort or anything similar. Again, if you want to obtain just the matrix (well, I don't know anything real that you will need it as it is without any calculations) then no, you can't, because, well, the size of this matrix is `N^2`. – dreamzor Dec 06 '12 at 22:38
"What do you want to do in O(logN)? There's nothing to depend on sort or anything similar." - it could be later :) if I shall begin to use some find-algorithms as: A*, Dijkstra’s algorithm, DFS/BFS etc... As you remember constructing adj-matrix is one of the important tasks for pathfinding and graph traversal, so the speed and etc are important ( and easy-access ). Also about actions with matrix, so algos as: Strassen algorithm uses log(2)7 , not n^2 for its iterations despite on sizeof N^2 of matrixes, so may be there are solutions for optimizing and may be adding new threads may increase it. – Secret Dec 06 '12 at 22:44
but not the only one. there are a lot of other structures which are more reliable to you right now. going to update my answer soon. – dreamzor Dec 06 '12 at 22:46
"there are a lot of other structures which are more reliable to you right now" are they faster/using less iterations, memory, time etc? Could they be used for another aims, cause using adj-matrixes, self-balanced data could give you NEW possibilities to use different algos with not fully-redesigning the program. Cause some algos could be used in different ways (the same algo). – Secret Dec 06 '12 at 22:53
"You don't need it for Dijkstra or BFS, that's a fact. :)" it's for them, but if I want to use Floyd–Warshall algorithm or A* (which is better than BFS/DFS or Dijkstra for its speed). – Secret Dec 06 '12 at 23:10
then i can only see the O(logn) solution that i posted. going to sleep now, good luck :) – dreamzor Dec 06 '12 at 23:13

Calculating adjacency matrix from randomly generated graphs

1 Answers1