2

How should one use Cap'n Proto for an application's mutable state similar to how Protobuf gets used? Is there a garbage collector?

Kenton Varda confirmed in his comparison of Cap'n Proto, FlatBuffers, and SBE that Cap'n Proto uses arena allocators internally to messages. A single message would grow without bound if one edits it over an extended period, say due to being written to disk and reloaded.

Are there any garbage collectors for Cap'n Proto to rearange the message and reclaim any wasted space? Would a garbage collector be the optimal approach? If not, or if not exists, then what is the recommended approach?

I'm actually writing a Rust program that must only save encrypted data anyways. I'm therefore okay with recopying the whole message structure, but I'm curious about the options more widely.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Jeff Burdges
  • 4,204
  • 23
  • 46
  • Wouldn't simply creating a new copy do the trick? `T(x).swap(x);` – Kerrek SB Jun 30 '15 at 08:15
  • I'd imagine so. I said that I'm okay with doing that, as I must encrypt everything anyways. I asked this question because it'll be slower to recopy than to garbage collect. It might however be that no garbage collector exists but other fast tricks exist. I donno. – Jeff Burdges Jun 30 '15 at 08:23
  • 1
    Could you please post the benchmarks that show that "it'll be slower to recopy than to garbage collect"? – Kerrek SB Jun 30 '15 at 08:25
  • Questions should have *one question*. I see at least 3 different ones, including "how do I use ". – Shepmaster Jun 30 '15 at 13:18

1 Answers1

5

The only way to reclaim wasted space is to copy the message into a new MessageBuilder. Only the "used" parts will be copied. This effectively is "GC" -- many of the best GC algorithms actually move data, which is what you'd be doing here.

There is no practical way to implement non-moving GC of arena-allocated Cap'n Proto messages.

I am considering extending the Cap'n Proto code generator in C++ to also generate a set of classes appropriate for representing the same data structures on the heap, such that you can modify the structure over time. Converting between the heap representation and the arena representation will require a copy, of course. But, this isn't yet implemented and I don't have any timeline. (The Rust implementation would likely get a similar update.)

Kenton Varda
  • 41,353
  • 8
  • 121
  • 105
  • Wouldn't most mutations modify the data in place anyway, avoiding extra allocations/garbage? – aij Jun 30 '15 at 20:23
  • @aij If you're only modifying an integer (or other primitive), sure. But if you're replacing a string, or extending a list, or removing a sub-object, etc., then you have a problem. – Kenton Varda Jun 30 '15 at 20:35
  • Alright that's great. I'm completely happy moving the data since I must encrypt it anyways. – Jeff Burdges Jul 01 '15 at 09:21
  • If I understand, Cap'n Proto messages are a bunch of segments with the arena spanning the collection of segments, maybe managing the segmenting could provide a relatively painless hack for doing this. – Jeff Burdges Jul 13 '15 at 17:28
  • @JeffBurdges You could fairly easily allocate a new segment for every object, but it won't perform as well. Cap'n Proto optimizes for big segments, and the assumption that pointers don't usually cross segment boundaries. But with that said, this could be an interesting idea to explore as a way to trade off performance for ease of use. – Kenton Varda Jul 13 '15 at 21:41