27

The generic problem

Suppose you are coding a system that consists of a graph, plus graph rewrite rules that can be activated depending on the configuration of neighboring nodes. That is, you have a dynamic graph that grows/shrinks unpredictably during runtime. If you naively use malloc, new nodes are going to be allocated in random positions in memory; after enough time, your heap will be a pointer spaghetti, giving you terrible cache efficiency. Is there any lightweight, incremental technique to make nodes that wire together stay close together in memory?

What I tried

The only thing I could think of is embedding the nodes in a cartesian space with some physical elastic simulation that repulsed/attracted nodes. That'd keep wired nodes together, but looks silly and I guess the overhead of the simulation would be bigger than the cache efficiency speedup.

The solid example

This is the system I'm trying to implement. This is a brief snippet of the code I'm trying to optimize in C. This repo is a prototypal, working implementation in JS, with terrible cache efficiency (and of the language itself). This video shows the system in action graphically.

MaiaVictor
  • 51,090
  • 44
  • 144
  • 286
  • I imagine that using [`realloc`](http://devdocs.io/c/memory/realloc) smartly would be a step toward achieving your proposed organization of nodes. – Patrick Roberts Jan 29 '16 at 18:26
  • 4
    If you provide a snippet with a sample `struct` for your graph nodes, you could turn this into a more concrete problem where answers can provide actual code to demonstrate the solution. – Patrick Roberts Jan 29 '16 at 18:29
  • 1
    Sure, but to be honest I think the question is complete enough as it is, so I'll post more details here as a comment. [This](http://i.stack.imgur.com/eD36T.jpg) is the system I'm trying to implement. [Here](http://paste.ofcode.org/34tfgELFk8HNSyDQd46wawB) is a brief snippet of the code I'm trying to optimize in C. [This repo](https://github.com/MaiaVictor/optlam/blob/master/optlam.js) is a prototypal, working implementation in JS, with terrible cache efficiency (and of the language itself). [This video](https://www.youtube.com/watch?v=lhNtgbFTXFE) shows the system in action graphically. – MaiaVictor Jan 29 '16 at 18:38
  • 1
    Let me know if you need more info, or if you think it would be better to edit that into the question. (The reason I don't think so is I tried to make it a generic question for graph rewriting systems, not for my particular problem. That way it could help other programmers in a future.) – MaiaVictor Jan 29 '16 at 18:39
  • This is my personal opinion, but I think that this is an advanced enough topic that anyone looking for a reusable solution to cache optimization for dynamic graphs would benefit more greatly by tailoring a concrete code snippet to their particular use-case than by implementing a general explanation of the solution from scratch. – Patrick Roberts Jan 29 '16 at 18:45
  • Maybe you're right, but I'm not sure I know how to put it that way. I think the overall technique would be the same. For example, if the answer is *"the name of this problem is X, and a particle simulation in a 4d space with eletric fields and z-order indexing works great, check those references"* (just made this up), that'd help me and future readers. If I changed the question to my particular problem, that could make it too specific and cause the impression that I'm expecting an answerer to write a lot of code to me. What do you think, @PatrickRoberts? – MaiaVictor Jan 29 '16 at 19:02
  • 1
    I think you're probably right. Leaving this as general as it is now would give the accurate impression to people viewing this question that the answer could apply to their specific use-case. – Patrick Roberts Jan 29 '16 at 19:05
  • I honestly don't know, asking is hard by itself! Lets see how it goes and let me know if you have an idea. – MaiaVictor Jan 29 '16 at 19:06
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/102030/discussion-between-patrick-roberts-and-viclib). – Patrick Roberts Jan 29 '16 at 19:19
  • The video makes the graph look very dynamic with nodes moving about as they are attached and detached from the graph. Is what you are looking for is some kind of garbage collection in which allocated nodes may actually be moved in memory or are you looking for something that would make reasonably intelligent choices about using nearby memory if available? Would you want some kind of a sweep and rearrange type of memory manager which would involve moving of allocated nodes or just a kind of best effort placement? – Richard Chambers Jan 29 '16 at 19:22
  • The visualization is quite bad, to be honest, because it moves the nodes up/down a line in a very stupid way. I think my hand-drawn image gives a best picture of what the graph looks like, if you are creative enough to imagine the transformations happening. I'm not asking any of those in particular, @RichardChambers - both look reasonable options and I can't tell what is better. But, speaking intuitively, swapping node positions in L1 is probably fast, so why not? Maybe that could be a key part of the strategy of keeping wired nodes together - an incremental swapping of badly positioned nodes. – MaiaVictor Jan 29 '16 at 19:26
  • This is sufficient, by the way: I need to be able to allocate nodes and erase nodes dynamically, and, anytime during execution, any arbitrary slice of the memory must contain mostly nodes that are wired together. – MaiaVictor Jan 29 '16 at 19:29
  • So if you allocate blocks of memory which are then divided into nodes as allocation segments and ordered by say memory address on a heap which is then used for allocation and deallocation so that the heap maintains an ordered list of available nodes. Allocation would involve specifying a node as part of the allocation request so the nearest neighboring available node would be provided. If the distance exceeds some value then a sweep is done to reorganize the graph, swapping node content. However clustering of data may be an issue and I could see thrashing of reorganization happening. – Richard Chambers Jan 29 '16 at 19:35
  • That sounds very simular to the approach I'm trying to write right now! I have no idea if this will work or not, but I have the impression some parts of the memory will become congested and the system will get stuck... – MaiaVictor Jan 29 '16 at 19:38
  • Is there no special property of the graph that you can exploit? (I don't mean graph property, but rather real-world knowledge about what the nodes represent, and therefore which nodes are more likely to be connected.) – biziclop Jan 29 '16 at 23:13
  • 1
    I haven't seen it mentioned yet but this problem sounds very similar to the problem filesystems have laying data out on disk. You might want to look at filesystems for inspiration. – Zan Lynx Jan 30 '16 at 02:15
  • "Is there any lightweight, incremental technique to make nodes that wire together stay close together in memory?" - Yes. Use a GC'd language without pointers. – John Dvorak Jan 30 '16 at 05:39
  • One question is, is there so little creation and deletion of nodes, compared to just accessing the neighboring nodes together, that it's worthwhile to make allocation slower? – hyde Jan 30 '16 at 06:10
  • @JanDvorak Why would memory allocation of a GC'd language achieve any better (or any worse) cache locality for this problem? – hyde Jan 30 '16 at 06:11
  • @hyde gc'd languages without pointers can rearrange items on the heap as they see fit, ideally keeping related items together. Languages with pointers can't do that. – John Dvorak Jan 30 '16 at 06:15
  • @JanDvorak Do you have a concrete language implementation, which would actually do it in a useful way for this problem? – hyde Jan 30 '16 at 06:20

4 Answers4

10

What you are looking to solve is the Linear Arrangement Problem. Perfect solutions are considered to be NP-hard, but some good approximations exist. Here is a paper which should be a good place to start.

kazagistar
  • 1,537
  • 8
  • 20
  • That is a very promising solution since it works in linear time. I'll dig into the paper to understand better how it works. Nether less, I think the problem would actually be an incremental version of the Linear Arrangement Problem (otherwise it would require some global memory rearrangement pauses). – MaiaVictor Jan 29 '16 at 21:03
  • 2
    If you get such an incremental solution working, you might consider a publication :) – kazagistar Jan 29 '16 at 21:04
  • I'll interpret that as a "proven to be impossible" :) – MaiaVictor Jan 29 '16 at 21:13
  • 1
    No, not even close; I think you have a pretty good chance even. This is a specific enough problem that no one might have put in that much effort yet. Plus, I cannot guarantee there isn't a solution; to be honest, I haven't dug around through the references in that paper yet. – kazagistar Jan 29 '16 at 21:15
  • Oh, okay. Thank you. If I find a solution I'll try to figure out how you publish something like that (I'm not a scientist). – MaiaVictor Jan 29 '16 at 21:20
  • 2
    Another direction you might consider is [incrementally clustering graph nodes](http://cse.iitkgp.ac.in/~pabitra/paper/barna-sdm07.pdf) by minimum cuts and putting clustered nodes in the same cache page. – kazagistar Jan 29 '16 at 21:27
7

You might look at this in terms of halfspace garbage collection. This isn't hard to implement (I've done it for an interpreter), particularly since you're only doing it for fixed size node structures. Allocate from one large block (called a halfspace) of memory. When it gets too full or fragmented, stop and copy everything to the other (which you can also make bigger). The trick is updating all the pointers. For this there is a very elegant and efficient algorithm called scan copy. There's a nice discussion of it at Cornell. It essentially traverses the graph breadth first, copying as it goes, without any extra space other than what you're copying. A nice property of the algorithm is that breadth first levels end up adjacent after each copy. If this is a good enough level of locality, you'll get it very efficiently with this method.

Gene
  • 46,253
  • 4
  • 58
  • 96
6

If you're really concerned about the layout of memory, it might be worthwhile to manage it yourself.

You can malloc a large block of memory at startup, then you allocate space out from that block. You'll need a separate structure to keep track of what has and what hasn't been allocated. If you know that all allocated structures are of a certain size that can simplify allocated/free space management, i.e. an array of indexes, otherwise you could use a linked list of pointers in the free space. Given that you'll likely be allocating structs one at a time, you probably don't need to worry about keeping track of the smallest and/or largest contiguous block of free space.

One thing you'll need to be careful of is alignment. Again, if you'll always be allocating memory in multiples of the size of a single struct, that makes things easier, otherwise it's probably a good idea to ensure that all allocations start at an 4 byte boundary, i.e. the difference between the address you allocate and the start address received from malloc is a multiple of 4.

You can pass additional parameters to your custom allocation functions to give hints about where the block should be placed, such as the address of one or more nearby nodes.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • 3
    Seems to me the hints and placement is actually the more difficult part of this question and the core of what is wanted to be accomplished. – Richard Chambers Jan 29 '16 at 18:46
  • 5
    Thanks! **You're completely right, but that's not what I'm asking**. I'm aware I have to write my own `malloc` (and `free`). What I'm asking is for algorithms that could be used **in order to implement those**, in a way that would kept nodes that wire together close together in memory. Think about it: when 2 nodes interact, they can create new nodes close to them. But if the memory is all filled up, there won't be free space near! So maybe a sparse mem is required? That is one of the many concerns that I'd expect an answer to address. Nether less, this was very constructive, so I upvoted it. – MaiaVictor Jan 29 '16 at 18:48
  • 1
    I just wanted to clarify something, since OP stated that `you have a dynamic graph that grows/shrinks unpredictably during runtime`. For the initial `malloc`, are you suggesting that the separate structure allocates memory similarly to a vector, where you copy the data to a new `malloc`d block whenever the graph exceeds the total allocated space? – Patrick Roberts Jan 29 '16 at 18:48
  • 2
    Also, @RichardChambers is correct, that's exactly it. :) – MaiaVictor Jan 29 '16 at 18:53
4

This can be viewed as a graph partitioning problem, where you're trying to cluster linked graph nodes on the same memory block. METIS is a good graph partitioning algorithm that is probably not appropriate for your use case because it requires global operations across the entire graph, however two distributed graph partitioning algorithms that may be modified to be applicable for your use case are DIDIC and Ja-Be-Ja - the former attempts to minimize the number of edges that cross partitions without respect to partition size, while the latter attempts to create equally sized partitions. Both algorithms only require local knowledge of the graph to cluster each node, so if you've got any spare cycles you can use them to incrementally rebalance the graph. Fennel is a related algorithm that operates on streaming graphs, so e.g. you could use Fennel or a similar algorithm when initially allocating a graph node, and then use DIDIC/Ja-Be-Ja when rebalancing the graph.

Zim-Zam O'Pootertoot
  • 17,888
  • 4
  • 41
  • 69