memoryerror using nauty_geng in Sage

Question

I am trying to get Sage to generate all graphs with 11 vertices, 30 edges, and clique number 4. I typed in the following:

 g11=[g for g in graphs.nauty_geng('11 30') if g.clique_number()==4]

After a while I see the following message:

MemoryError                               Traceback (most recent call last)
<ipython-input-6-1ec9660b8e07> in <module>()
----> 1 g11=[g for g in graphs.nauty_geng('11 30') if g.clique_number()==Integer(4)]

/opt/sagemath-8.6/local/lib/python2.7/site-packages/sage/graphs/graph.pyc in clique_number(self, algorithm, cliques, solver, verbose)
   6072         self._scream_if_not_simple(allow_loops=False)
   6073         if algorithm == "Cliquer":
-> 6074             from sage.graphs.cliquer import clique_number
   6075             return clique_number(self)
   6076         elif algorithm == "networkx":

It seems that I do not have enough memory in my RAM to ask Sage to do this for me. Is there a way I can get Sage to store this information elsewhere? Does Sage only have to use RAM memory? I have 1 terabyte of storage available.

If this is not possible, then how can I resolve this issue? Thank you in advance!

score 3 · Answer 1 · answered Jun 11 '19 at 00:43

Counting before listing

Sometimes storing mathematical objects of interest in a list is too ambitious, because that would take up too much memory.

One first step could be to count how many such graphs there are, and how long it takes to iterate through them, before attempting to store them.

The timings below are on one particular machine; they might differ on other machines.

Counting graphs on 11 vertices with 30 edges with clique number 4 took about two hours.

sage: g_11_30 = graphs.nauty_geng('11 30:30')
sage: g_11_30_c4 = (g for g in g_11_30 if g.clique_number() == 4)
sage: %time nb_g_11_30_c4 = sum(1 for g in g_11_30_c4)
CPU times: user 2h 12min 9s, sys: 1min 9s, total: 2h 13min 18s
Wall time: 2h 13min 18s
sage: nb_cg_11_30_c4
58211868

Counting only the connected ones took roughly the same time.

sage: cg_11_30 = graphs.nauty_geng('11 30:30 -c')
sage: cg_11_30_c4 = (g for g in cg_11_30 if g.clique_number() == 4)
sage: %time nb_cg_11_30_c4 = sum(1 for g in cg_11_30_c4)
CPU times: user 2h 13min 27s, sys: 1min 11s, total: 2h 14min 38s
Wall time: 2h 14min 39s
sage: nb_cg_11_30_c4
58182054

We see there are about 58.2 million graphs on 11 vertices with 30 edges and clique number 4, most of them connected -- only 29814 are not. If we only care about the nonconnected ones, it makes quite a difference!

Iterating rather than listing

If storing these graphs is not feasible, we know we can run through them in two hours every time we want to know something about them.

One nice way to learn about iterating vs listing is to run through the SageMath tutorial on comprehensions.

For example, take the first graph in the collection and check its edges and its graph6 string (more on the graph6 format):

sage: g_11_30 = graphs.nauty_geng('11 30:30')
sage: g_11_30_c4 = (g for g in g_11_30 if g.clique_number() == 4)
sage: g = next(g_11_30_c4)
sage: print(g.edges(labels=False))
[(0, 7), (0, 8), (0, 9), (0, 10), (1, 7), (1, 8), (1, 9), (1, 10),
(2, 7), (2, 8), (2, 9), (2, 10), (3, 7), (3, 8), (3, 9), (3, 10),
(4, 8), (4, 9), (4, 10), (5, 8), (5, 9), (5, 10), (6, 8), (6, 9),
(6, 10), (7, 9), (7, 10), (8, 9), (8, 10), (9, 10)]
sage: g.graph6_string()
'J???Fb}~~~_'

and the second one:

sage: g = next(g_11_30_c4)
sage: print(g.edges(labels=False))
[(0, 7), (0, 8), (0, 9), (0, 10), (1, 7), (1, 8), (1, 9), (1, 10),
(2, 7), (2, 8), (2, 9), (2, 10), (3, 7), (3, 8), (3, 9), (3, 10),
(4, 8), (4, 9), (4, 10), (5, 8), (5, 9), (5, 10), (6, 8), (6, 9),
(6, 10), (7, 8), (7, 9), (7, 10), (8, 10), (9, 10)]
sage: g.graph6_string()
'J???Fb~~v~_'

and so on.

Storing smaller equivalent data

If the graphs themselves are too much to store in a list, maybe we can use more compact representations of these graphs, which would take up less memory. For example, the list of edges lets us easily reconstruct a graph; so does the very compact "graph6 string".

To give us an idea, let us compare the file sizes for the list of the first ten thousand graphs as a Sage object, the list of their graph edge lists as Sage object, and the graph6 string for them as a text file:

sage: g_11_30 = graphs.nauty_geng('11 30:30')
sage: g_11_30_c4 = (g for g in g_11_30 if g.clique_number() == 4)
sage: graphs = [next(g_11_30_c4) for _ in range(10^4)]
sage: save(graphs, "g_11_30_c4_1e4_graph_bare")

sage: g_11_30 = graphs.nauty_geng('11 30:30')
sage: g_11_30_c4 = (g for g in g_11_30 if g.clique_number() == 4)
sage: edges = [next(g_11_30_c4).edges(labels=False) for _ in range(10^4)]
sage: save(edges, "g_11_30_c4_1e4_graph_edges")

sage: g_11_30 = graphs.nauty_geng('11 30:30')
sage: g_11_30_c4 = (g for g in g_11_30 if g.clique_number() == 4)
sage: s = '\n'.join(next(g_11_30_c4).graph6_string() for _ in range(10^4))
sage: with open('g_11_30_c4_graph_graph6.txt', 'w') as f:
....:     f.write(s)
....:
119999

Compare the corresponding file sizes:

g_11_30_c4_1e4_graph_bare.sobj: 971K
g_11_30_c4_1e4_graph_edges.sobj: 775K
g_11_30_c4_1e4_graph_graph6.txt: 117K

Obviously the graph6 format wins, and storing all 58.2 million graphs in this format in a text file would take ~ 5820 * 117K, i.e. ~ 680M.

We could also store it in 100 files numbered 0 to 99, as follows:

sage: n = 100
sage: for k in range(N):
....:     gk = graphs.nauty_geng('11 30:30 {}/{}'.format(k, n))
....:     ggk = (g for g in gk if g.clique_number() == 4)
....:     s = '\n'.join(g.graph6_string() for g in ggk)
....:     with open('g_11_30_c4_graph_graph6_file_{}_of_{}.txt'
....:               .format(k, n - 1), 'w') as f:
....:         f.write(s)

This will let us study these graphs over several sessions without making nauty work for two hours each time.

Recommended reading, depending on the Python version your Sage is based on:

memoryerror using nauty_geng in Sage

1 Answers1

Counting before listing

Iterating rather than listing

Storing smaller equivalent data