0

I want to perform graph isomorphism tests for a very long random walk with fixed windows. That is given a target graph, say a triangle, I want to find how many consecutive 3 nodes in the random walk induce a triangle.

Graph isomorphism test is very costly and there may be repetitive graph patterns appearing in the random walk. Thus, it is expensive to do the isomorphism test on-the-fly when the random walk is simulated.

Hence, I want to store the random walk first. Afterwards, I want to use some pruning techniques to reduce the number of isomorphism test.

So my question is that how to store a very long random walk with very low memory cost? The naive way is just to store the whole sequence of the random walk, which will cause high memory usage. Is there any better cache technique to do that?

  • Your goal is unclear. It is impossible to store more information in less space without losing some, so are you asking to save ram by writing data to disk on the go? Or is it that you only want to store certain features of the walk to save space? Then it depends on features. If it's just isomorphism then you cant avoid storing incidency, but that wont help you with triangles. If its triangles you can count those on the go very inexpensively. – IcedLance Jul 17 '19 at 06:49
  • @IcedLance: thanks for your reply. What I want to do is that given a target graph G of size k and a very long random walk of length l (k< – Rise of Kingdom Jul 19 '19 at 07:24
  • I dont want to test isomorphism on the fly and simulate the random walk simultaneously because many graphs induced by nodes in random walk may be the same (e.g., node sequence 3,6,2,7 may appear many times in the random walk). If I want store the random walk first and then record unique graphs induced by node sequence with corresponding appearing frequency, I may reduce the number of isomorphism tests. However, if the length of the random walk is very huge, directly storing it is not realistic. Continue in the next comment ... – Rise of Kingdom Jul 19 '19 at 07:25
  • Furthermore, the number of graphs induced by node sequences can be exponentially large. So my question is that, how to store the graphs induced by nodes in random walks with low memory cost? – Rise of Kingdom Jul 19 '19 at 07:25
  • If I understand you right: lets say K is number of vertices, L is length of walk, P is the size of subgraph you want to check. You can have a dictionary of all possible P-long stretches, that will take K^P space and have constant lookup complexity. You can even pre-calculate isomorphisms on it. If P 1213 ~P complexity ). – IcedLance Jul 19 '19 at 08:40
  • But usually if you have a small set of features you want to look for then you try to calculate them on the go. For example for given graph G you can pre-calculate a set of all P-long walks isomorphic to it and then just match sequences to said set. With that you basically move expensive calculations to before processing the data, but don't have to store as much. – IcedLance Jul 19 '19 at 08:41
  • @IcedLance: thanks for your feedback. My apology for not making my point clear. Given a query(target) graph T is of size k and a very long random walk, I want to check whether each graph C formed by k consecutive nodes in the random walk is isomorphic to T. Hence, graph T and graph C are of the same size. However, it is possible that many graphs C may are the same. I just want to use isomorphism test only once for each unique graph C by analyzing and storing the random walk first. Can your solution above still be applied to this case? – Rise of Kingdom Jul 21 '19 at 12:51

0 Answers0