0

I want to generate a huge weighted undirected graph, represented by a huge adjacency matrix AJM. So for the loop over i and j,

AJM[i][j] = AJM[j][i]

AJM[i][i] = 0

The weights are generated as random double numbers in the interval, say [0.01, 10.00]. If I have 10k vertices, the matrix would be 10k by 10k with double type entries, which is a huge chunk in the memory if I store it.

Now I want to set a threshold E for the wanted number of edges, and ignore all the edges with weight larger than some threshold T (T is determined by E, E is user-defined), just store the smallest E edges with weight under T in a vector for later use. Could you give me some suggestion how to achieve this in an efficient manner? It is best to avoid any kind of storage of the whole adjacency matrix, just use streaming structure. So I'm wondering how I should generate the matrix and do the thresholding?

I guess writing and reading file is needed, right?

One approach would be, after some kind of manipulation with file, I set the threshold E and do the following:

I read the element from the matrix one by one so I don't read in the whole matrix (could you show some lines of C++ code for achieving this?), and insert its weight into a min-heap, store its corresponding edge index in a vector. I stop when the size of the heap reaches E so that the vector of edge indices is what I want.

Do you think its the right way to do it? Any other suggestions? Pls point out any error I may have here. Thank you so much!

Logan Yang
  • 2,364
  • 6
  • 27
  • 43
  • It *sounds* like a *sparse-matrix* would alleviate your memory concerns, though worst-case would still consume a boatload of memory. how you would implement the sparsity is a separate issue. – WhozCraig Sep 17 '13 at 22:05
  • How random is random here? You could use a PRNG of which you would only need to store the seed value. Possibly N seed values. – mvds Sep 17 '13 at 22:17
  • Is it a problem that this can generated disconnected graphs? Do you need to keep the original graph, or is it some kind of temporary? – Adam Burry Sep 18 '13 at 00:57
  • @AdamBurry Yes the result of thresholding is a disconnected forest. I don't need to keep the original graph – Logan Yang Sep 18 '13 at 07:35

1 Answers1

0

If there is no need to keep the original threshold-ed graph then it sounds like there is an easy way to save yourself a lot of work. You are given the number of vertices (V=10,000), and the number of edges (E) is user configurable. Just randomly select pairs of vertices until you have the required number of edges. Am I missing an obvious reason why this would not be equivalent?

Adam Burry
  • 1,904
  • 13
  • 20