2

I would like to insert fixed-length numerical values in a tree and check their presence at a later point. Most of the numerical values are similar so it does not make sense to put them in a SET since I would be wasting space.

Since ReJson PATH expects "Java-like" naming convention for key names, this is what I came up with:

{
  "_0": {
    "_1": {
      "_2": true
    }
  },
  "_2": {
    "_3": {
      "_4": true
    }
  }
}

So, when I need to know if "012" has been set, I need to do check if "JSON.GET key ._0._1._2" == true.

Initially, I tried saving the natural tree values such as:

{
  "0": {
    "1": {
      "2": true
    }
  },
  "2": {
    "3": {
      "4": true
    }
  }
}

But I cannot write any PATH that would be able to traverse this tree, neither in dot nor in bracket form.

Any suggestions? Should I stick to my underscored values and ._a._b._c pattern or is there a better way?

Guy Korland
  • 9,139
  • 14
  • 59
  • 106
alturkovic
  • 990
  • 8
  • 31
  • What do you mean by "traverse"? The `JSON.GET` snippet that you posted works on my laptop :) Also, are you sure you want to use ReJSON for that? – Itamar Haber Oct 25 '18 at 21:41
  • I made some edits to provide more clarity, please take a look. I am aware the `JSON.GET` I posted works, but I need to use the underscores since `PATH` expects "Java-like" naming convention so I cannot write the `JSON.GET` for the second tree I posted. Of course, I am open to new suggestions :) It just seemed like ReJson does what I wanted, a real nested tree structure. I found your recommendation for using ReJson for tree structures here: https://stackoverflow.com/a/50193997/5291611 – alturkovic Oct 26 '18 at 08:25

1 Answers1

4

ReJSON uses paths that are JSONPath-like (but not exactly). Specifically, the docs state that "Names must begin with a letter, a dollar ($) or an underscore (_) character". This means that your second tree is not supported.

While ReJSON could be used to store a tree-like structure, it looks like your use case would not benefit from using it. Instead, I'd look into flattening the tree and storing it as a Hash (or even a Set), where each field represents a path, e.g.:

HSET tree 0.1.2 1 2.3.4 1

Then you can use something like HEXISTS to check for "truthiness".

Itamar Haber
  • 47,336
  • 7
  • 91
  • 117
  • But storing a lot of similar values would take more memory using hashes, wouldn't it? Storing values like 0123456789, 0123456788, 0123456787, 0123456786 has a long common prefix and I would insert a lot of "useless" data this way, whereas using trees, this wouldn't be the case. I now realize I might have explained my use-case better :) – alturkovic Oct 26 '18 at 13:48
  • I'm not sure you'll see an improvement with ReJSON despite the common prefix - it has its own overhead. You can, however, measure that with `MEMORY USAGE` command. – Itamar Haber Oct 27 '18 at 14:19
  • 1
    Funny, I'm actually arguing against using the module that I wrote :) – Itamar Haber Oct 27 '18 at 14:19
  • Interesting, I'll do some testing with ~10M similar values with ~2M non-similar with both approaches to test the memory usage. Thanks for the tips :) – alturkovic Oct 27 '18 at 14:37
  • straight up object property works in a hash, sets work ok too, array of arrays works too if don't mind for sort on insert, but then lookups are binary tree for each matching digit. Also if not sure if size matters more than lookup time. Hash key finding is generally pretty fast unless all the sequences are near identical – Mark Essel Oct 27 '18 at 19:07
  • "unless all the sequences are near identical" <- this actually depends on the hash function in use – Itamar Haber Oct 27 '18 at 19:11