ApacheArrow for Trie Data

Asked Nov 15 '22 at 06:22

Active Nov 15 '22 at 06:22

Viewed 39 times

I am seeking on using Arrow for read-heavy operations on trie data structures. I'm slightly hesitant with using Arrow since I can't really see a natural representation of the data in terms of columns. Specifically, the data I work with can be viewed as a trie where the keys are tuples of strings, ints, etc and all the values are at the leaves. An example:

     .
  a / \ 1
   .   .
2 / \ c  \ 3
3.0 "hi"  [arr of int]

The key set for any given trie can vary. Indeed, I am actually dealing with many different tries, each with slightly different keysets and corresponding leaf values.

The end goal would be to have
a) A means to read slices of trie into memory without loading everything into memory
b) The ability to reconstruct tries (can be expensive) if needed.

I should mention that I am considering HDF5 as an alternative. If it is important, I am working in Julia.

asked Nov 15 '22 at 06:22

Ian_L

ApacheArrow for Trie Data

0 Answers0