I am seeking on using Arrow for read-heavy operations on trie data structures. I'm slightly hesitant with using Arrow since I can't really see a natural representation of the data in terms of columns. Specifically, the data I work with can be viewed as a trie where the keys are tuples of strings, ints, etc and all the values are at the leaves. An example:
.
a / \ 1
. .
2 / \ c \ 3
3.0 "hi" [arr of int]
The key set for any given trie can vary. Indeed, I am actually dealing with many different tries, each with slightly different keysets and corresponding leaf values.
The end goal would be to have
a) A means to read slices of trie into memory without loading everything into memory
b) The ability to reconstruct tries (can be expensive) if needed.
I should mention that I am considering HDF5 as an alternative. If it is important, I am working in Julia.