Counting before listing
Sometimes storing mathematical objects of interest in a list is
too ambitious, because that would take up too much memory.
One first step could be to count how many such graphs there are,
and how long it takes to iterate through them, before attempting
to store them.
The timings below are on one particular machine; they might differ
on other machines.
Counting graphs on 11 vertices with 30 edges with clique number 4
took about two hours.
sage: g_11_30 = graphs.nauty_geng('11 30:30')
sage: g_11_30_c4 = (g for g in g_11_30 if g.clique_number() == 4)
sage: %time nb_g_11_30_c4 = sum(1 for g in g_11_30_c4)
CPU times: user 2h 12min 9s, sys: 1min 9s, total: 2h 13min 18s
Wall time: 2h 13min 18s
sage: nb_cg_11_30_c4
58211868
Counting only the connected ones took roughly the same time.
sage: cg_11_30 = graphs.nauty_geng('11 30:30 -c')
sage: cg_11_30_c4 = (g for g in cg_11_30 if g.clique_number() == 4)
sage: %time nb_cg_11_30_c4 = sum(1 for g in cg_11_30_c4)
CPU times: user 2h 13min 27s, sys: 1min 11s, total: 2h 14min 38s
Wall time: 2h 14min 39s
sage: nb_cg_11_30_c4
58182054
We see there are about 58.2 million graphs on 11 vertices with 30 edges
and clique number 4, most of them connected -- only 29814 are not.
If we only care about the nonconnected ones, it makes quite a difference!
Iterating rather than listing
If storing these graphs is not feasible, we know we can run through them
in two hours every time we want to know something about them.
One nice way to learn about iterating vs listing is to run through the
SageMath tutorial on comprehensions.
For example, take the first graph in the collection and check its edges
and its graph6 string
(more on the graph6 format):
sage: g_11_30 = graphs.nauty_geng('11 30:30')
sage: g_11_30_c4 = (g for g in g_11_30 if g.clique_number() == 4)
sage: g = next(g_11_30_c4)
sage: print(g.edges(labels=False))
[(0, 7), (0, 8), (0, 9), (0, 10), (1, 7), (1, 8), (1, 9), (1, 10),
(2, 7), (2, 8), (2, 9), (2, 10), (3, 7), (3, 8), (3, 9), (3, 10),
(4, 8), (4, 9), (4, 10), (5, 8), (5, 9), (5, 10), (6, 8), (6, 9),
(6, 10), (7, 9), (7, 10), (8, 9), (8, 10), (9, 10)]
sage: g.graph6_string()
'J???Fb}~~~_'
and the second one:
sage: g = next(g_11_30_c4)
sage: print(g.edges(labels=False))
[(0, 7), (0, 8), (0, 9), (0, 10), (1, 7), (1, 8), (1, 9), (1, 10),
(2, 7), (2, 8), (2, 9), (2, 10), (3, 7), (3, 8), (3, 9), (3, 10),
(4, 8), (4, 9), (4, 10), (5, 8), (5, 9), (5, 10), (6, 8), (6, 9),
(6, 10), (7, 8), (7, 9), (7, 10), (8, 10), (9, 10)]
sage: g.graph6_string()
'J???Fb~~v~_'
and so on.
Storing smaller equivalent data
If the graphs themselves are too much to store in a list, maybe we can
use more compact representations of these graphs, which would take up
less memory. For example, the list of edges lets us easily reconstruct
a graph; so does the very compact "graph6 string".
To give us an idea, let us compare the file sizes for
the list of the first ten thousand graphs as a Sage object,
the list of their graph edge lists as Sage object,
and the graph6 string for them as a text file:
sage: g_11_30 = graphs.nauty_geng('11 30:30')
sage: g_11_30_c4 = (g for g in g_11_30 if g.clique_number() == 4)
sage: graphs = [next(g_11_30_c4) for _ in range(10^4)]
sage: save(graphs, "g_11_30_c4_1e4_graph_bare")
sage: g_11_30 = graphs.nauty_geng('11 30:30')
sage: g_11_30_c4 = (g for g in g_11_30 if g.clique_number() == 4)
sage: edges = [next(g_11_30_c4).edges(labels=False) for _ in range(10^4)]
sage: save(edges, "g_11_30_c4_1e4_graph_edges")
sage: g_11_30 = graphs.nauty_geng('11 30:30')
sage: g_11_30_c4 = (g for g in g_11_30 if g.clique_number() == 4)
sage: s = '\n'.join(next(g_11_30_c4).graph6_string() for _ in range(10^4))
sage: with open('g_11_30_c4_graph_graph6.txt', 'w') as f:
....: f.write(s)
....:
119999
Compare the corresponding file sizes:
g_11_30_c4_1e4_graph_bare.sobj
: 971K
g_11_30_c4_1e4_graph_edges.sobj
: 775K
g_11_30_c4_1e4_graph_graph6.txt
: 117K
Obviously the graph6 format wins, and storing all 58.2 million graphs
in this format in a text file would take ~ 5820 * 117K, i.e. ~ 680M.
We could also store it in 100 files numbered 0 to 99, as follows:
sage: n = 100
sage: for k in range(N):
....: gk = graphs.nauty_geng('11 30:30 {}/{}'.format(k, n))
....: ggk = (g for g in gk if g.clique_number() == 4)
....: s = '\n'.join(g.graph6_string() for g in ggk)
....: with open('g_11_30_c4_graph_graph6_file_{}_of_{}.txt'
....: .format(k, n - 1), 'w') as f:
....: f.write(s)
This will let us study these graphs over several sessions without making
nauty work for two hours each time.
Recommended reading, depending on the Python version your Sage is based on: