0

I've got a question about using MapDB, especially about querying a submap. I'm taking the code snippet from the official example at https://github.com/jankotek/MapDB/blob/release-1.0/src/test/java/examples/TreeMap_Composite_Key.java. This example is easy to understand. For testing purposes I interchanged the key parts "Town" and "Street" and adjusted the submap call the same way. Unfortunately now the map is not limited by the submap call. Instead the whole map (200 entries) is returned. Following are the adapted code snippets (out of the example mentioned above)

// Initializing map
for (final String town : towns) {
  for (final String street : streets) {
    for (final int houseNum : houseNums) {
        final Fun.Tuple3<String, String, Integer> address = Fun.t3(street, town,
                houseNum);
        final int income = r.nextInt(50000);
        map.put(address, income);
    }
  }
}
...
final Map<Fun.Tuple3, Integer> housesInCong = map.subMap(
  Fun.t3(null, "Cong", null), Fun.t3(Fun.HI, "Cong", Fun.HI));

//housesInCong.size() == 200 (should be 40)
System.out.println("There are " + housesInCong.size()+ " houses in Cong");

Can someone explain to me why this happens and how this can be avoided? I've got a similar use case in my project.

Thanks in advance and regards :)

AnarchoEnte
  • 558
  • 4
  • 20

1 Answers1

0

I ran into a similar problem recently when indexing geographic objects in two-dimensional tiles. I had to browse the MapDB source code and do some experimenting to understand what's going on.

MapDB is storing your objects such that it is easy to iterate over them (or a sub-range of them) in their natural order. That order is not something you can change when iterating over the values, it is something that is taken into consideration as the objects are inserted. It affects the layout of the structure they are stored in (a b-tree).

The tuple classes included with MapDB have lexicographical order. That is to say, they are ordered like words in a dictionary: their first elements are compared to see which tuple is greater than the other. If the two first elements are equal, we break the tie by moving on to the second element, then the third. You could also say they behave like a positional number system where all the numbers you're comparing have the same number of digits.

As an example, let's look at a case where all of the elements in your tuples are one-digit integers. We begin by inserting all the possible combinations of three one-digit integers. If we filter like this:

map.subMap(Fun.t3(2, null, null), Fun.t3(4, Fun.HI, Fun.HI));

we would iterate over the tuples (2,0,0), (2,0,1) (2,0,2) ... (3,9,9). Now, as in your example we change the submap call to use these bounds tuples:

map.subMap(Fun.t3(null, 2, null), Fun.t3(Fun.HI, 4, Fun.HI));

Here we will iterate over the tuples (0,2,0), (0,2,1) (0,2,2) ... (9,3,9). The ordering is one-dimensional and the first element is more significant than the second one.

What we really want in our cases is: for each value of the first element, pull out a subMap where the second element varies continuously. This involves jumping around inside the tree every time the first element changes -- it's not one long continuous iteration. The best way I've found to express this is to just wrap the subMap calls in a for loop, varying the high-order element "manually":

for (int x = minX; x <= maxX; x++) {
    SortedSet<Tuple3<Integer, Integer, Integer>> xSubset = set.subSet(
        new Tuple3(x, minY, null  ), true, // inclusive lower bound, null tests lower than anything
        new Tuple3(x, maxY, Fun.HI), true  // inclusive upper bound, HI tests higher than anything
    );
    for (Tuple3<Integer, Integer, Long> item : xSubset) {
        int x = item.a;
        int y = item.b;
        int z = item.c;
        // ...
    }
}

As far as I know this reflects the natural complexity of the operation: you need to drill down into the tree anew to begin each iteration over the range of second elements.

abyrd
  • 611
  • 6
  • 8