0

I am using BitSet to keep track whether nodes in a Graph have been visited using DFS method. For this purpose I have created a BitSet[] array. The BitSets themselves can be between 100.000-500.000 entries.

This is the code I am using.

   public void tc() {
//        if (scc_cache_valid)
//           return;
        marked2 = new BitSet[node_count];
        for (int v = 0; v < node_count; v++) {
            //if (marked2[v] == null)
                marked2[v] = new BitSet(edge_count);
            System.out.println("aaa" + marked2[v].size());
            for (int w : children.get(v)) {
                if (!marked2[v].get(w))
                    dfs(v, w);
            }
        }
    }

    public void tc(int v) {
//        if (scc_cache_valid && marked2[v] != null)
//            return;

//        marked2 = new BitSet[node_count];
//        for (int v = 0; v < node_count; v++) {
        if (marked2[v] == null)
            marked2[v] = new BitSet(node_count);
        System.out.println(marked2[v].size());

        for (int w : children.get(v)) {
                if (!marked2[v].get(w))
                    dfs(v, w);
            }
//        }
    }

    public boolean reachable(int v, int w) {
        return marked2[v].get(w);
    }

    private void dfs(int v, int w) {
        marked2[v].set(w, true);
        System.out.println(marked2[v].length());

        for (int z : children.get(w)) {
            if (!marked2[v].get(z))
                dfs(v, z);
        }
    }

Unfortunately I am running out of heap. Is there a better (more memory efficient) solution to this problem ?

Thank You.

HGO HGO
  • 37
  • 1
  • 6
  • Have you extended the heap-space of your JVM? By default the max-settings is quite low. – Andy May 10 '17 at 11:44
  • How many true/false values in total do you actually need to store? – khelwood May 10 '17 at 11:49
  • So you're basically having a matrix of 500,000x500,000? That would take 31,250,000,000 bytes. A bitmap is better for non-sparse data. – RealSkeptic May 10 '17 at 11:49
  • I cannot change the heapsize as it is executed on another server which cannot control. – HGO HGO May 10 '17 at 11:52
  • I cannot change the heapsize as it is executed on another server which cannot control. I need to store for a graph of 50.000 nodes. 50.000 BitSets each in worst case with 50.000 true/false values – HGO HGO May 10 '17 at 12:05
  • It is sparse data. So is it better to use another datastructure with similar performance but better memory footprint ? – HGO HGO May 10 '17 at 12:07

1 Answers1

0

I think your DFS algorithm is incorrect.

  • A classic DFS algorithm for a tree doesn't require a bitmap at all.
  • A classic DFS algorithm for a DAG or a full graph requires a single bitmap with a one bit for each node in the graph. This assumes that there is a (dense) one-to-one mapping from nodes to integers; e.g. node numbers. If not, then it is conventional to use a HashSet<Node>.

In either case, the space requirement is O(N) rather than O(N^2).

A pseudo-code algorithm for the DAG / graph case is:

 dfs(root) = dfs0(root, new Set());
 dfs0(node, visited) = 
      if !visited.contains(node):
          visited.add(node)
          // do stuff for node
          foreach n in node.children:
              dfs0(n, visited)

Note: there is only one Set object used in the traversal.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Thank you. But I need to find for each node in the graph all its children and store it. Wouldn't I need a separate BitSet for each node ? – HGO HGO May 10 '17 at 12:27
  • Not necessarily. Anyhow, that is not what a DFS algorithm does, and your Question *says* you are trying to implement DFS. Why don't you explain clearly in the Question what you are actually trying to do? – Stephen C May 10 '17 at 12:32
  • I want to find whether there exists a path between any 2 nodes of the DAG. – HGO HGO May 10 '17 at 12:33
  • Do you mean Graph or DAG? – Stephen C May 10 '17 at 13:27
  • Thank you Steven! You saved me. I overlooked the bit that its faster to search than to store... Now I just dfs search from the source to find whether it hits the node it should connect to. – HGO HGO May 10 '17 at 13:28