1

Heterogeneous lookup means that we can index into a hash map holding keys of type std::string using another compatible type that makes sense, such as absl::string_view. For example, the following code works (I'm using the Abseil library rather than C++20 in my code for some compatibility reasons):

std::string word = "bird";
absl::flat_hash_map<std::string, int> word_map;
word_map[word] = 1;
std::cout << word_map[absl::string_view(word)] << std::endl;

It makes sense that this can work (and indeed it does), since all we need to address a hash table is the ability to compute the hash function, and the ability to compare for equality. So reading the hashtable using this method should be straightforward, and writing the table also makes sense, since the hash table can create a new std::string holding the contents of the string view.

A std::vector<T> also has a lightweight analogue of a string view, the absl::Span<T> type. However, the corresponding lookup does not work:

std::vector<int> nums = {1, 2, 3, 4};
absl::flat_hash_map<std::vector<int>, int> int_map;
int_map[nums] = 1;
std::cout << int_map[absl::Span<int>(nums)] << std::endl;

The compiler complains on the last line that there is no match for operator[].

Question: How can I implement this heterogeneous lookup so that it works for vectors and spans in the same way as for strings and string views?

I can see that absl::Hash<std::vector<int>> and absl::Hash<absl::Span<int>> produce the same results, so there should not be too many obstructions to making this work.

Joppy
  • 363
  • 2
  • 12

1 Answers1

1

You can implement Abseil's heterogeneous lookup feature by defining types to override hashing and comparing. Per documentation, they must be marked with an is_transparent trait to support conversion.

struct VectorHash {
    using is_transparent = void;

    size_t operator()(absl::Span<int> v) const {
        return absl::Hash<absl::Span<const int>>{}(v);
    }
    size_t operator()(const std::vector<int>& v) const {
        return absl::Hash<absl::Span<const int>>{}(absl::Span<const int>{ v.data(), v.size() });
    }
};

struct VectorEq {
    using is_transparent = void;

    bool operator()(const std::vector<int>& a, absl::Span<int> b) const {
        return std::equal(a.begin(), a.end(), b.begin(), b.end());
    }
    bool operator()(absl::Span<int> b, const std::vector<int>& a) const {
        return std::equal(a.begin(), a.end(), b.begin(), b.end());
    }
    bool operator()(const std::vector<int>& a, const std::vector<int>& b) const {
        return std::equal(a.begin(), a.end(), b.begin(), b.end());
    }
    bool operator()(absl::Span<int> b, absl::Span<int> a) const {
        return std::equal(a.begin(), a.end(), b.begin(), b.end());
    }
};

using int_map_t = absl::flat_hash_map<std::vector<int>, int, VectorHash, VectorEq>;

This will make lookup using at or find work. But [] will still fail. Why? Because the [] operator is an upsert - it creates the key if it doesn't exist. absl::string_view has an explicit conversion operator to std::string, so, creating a new std::string key from one works. absl::Span<int> does not have a conversion operator to std::vector<int>, so the operation fails.

If it's not an option to use at instead of [], you can still extend the type:

struct int_map_t : absl::flat_hash_map<std::vector<int>, int, VectorHash, VectorEq> {
    using absl::flat_hash_map<std::vector<int>, int, VectorHash, VectorEq>::flat_hash_map;
    using absl::flat_hash_map<std::vector<int>, int, VectorHash, VectorEq>::operator [];
    int& operator [](absl::Span<int> v) {
        return operator [](std::vector<int> { v.begin(), v.end() });
    }
};

Demo: https://godbolt.org/z/dW4av7


In the comments, you asked if it was possible to implement an operator [] override that doesn't copy the vector if the map entry exists, while still only doing one hash. This is a bit hacky and still might do extra comparisons, but I think you can accomplish this with a helper type that stores both a key and an already-computed hash:

struct VectorHashMemo {
    size_t hash;
    absl::Span<int> key;

    explicit operator std::vector<int>() const {
        return { key.begin(), key.end() };
    }
};

struct VectorHash {
    /* ...existing overloads... */
    size_t operator()(VectorHashMemo v) const {
        return v.hash;
    }
};

struct VectorEq {
    /* ...existing overloads... */

    bool operator()(const std::vector<int>& a, VectorHashMemo b) const {
        return operator()(a, b.key);
    }
    bool operator()(VectorHashMemo a, const std::vector<int>& b) const {
        return operator()(a.key, b);
    }
    bool operator()(VectorHashMemo b, VectorHashMemo a) const {
        return operator()(a.key, b.key);
    }
};

Then you can explicitly compute the hash only once, while accessing the map twice:

struct int_map_t : absl::flat_hash_map<std::vector<int>, int, VectorHash, VectorEq> {
    using absl::flat_hash_map<std::vector<int>, int, VectorHash, VectorEq>::flat_hash_map;
    using absl::flat_hash_map<std::vector<int>, int, VectorHash, VectorEq>::operator [];
    int& operator [](absl::Span<int> v) {
        VectorHashMemo hash = { absl::Hash<absl::Span<int>>{}(v), v };
        auto it = find(hash);
        if (it != end()) {
            return it->second;
        } else {
            // calls the explicit conversion operator
            return operator [](hash);
        }
        return operator [](std::vector<int> { v.begin(), v.end() });
    }
};

Demo: https://godbolt.org/z/fecevE

parktomatomi
  • 3,851
  • 1
  • 14
  • 18
  • Thanks, where did you find the documentation for this? I found a mention of something similar in [this tip](https://abseil.io/tips/144), but no "official" documentation as far as I could find. What I was ultimately after was a mechanism so that `[]` does one hash-lookup and zero copies for an existing key, or one hash-lookup and one copy for a non-existing key - do you think that there is any way to do that? And by the way, do you know why string_view is convertible to string, but span is not convertible to vector? – Joppy Nov 04 '20 at 10:18
  • I was looking at the same document you are. Not even sure there is any official documentation, which is not super-encouraging from the library maintainers. I have no idea why the conversion only exists for `string_view`, but if you asked me to speculate it's because `string_view` is tightly coupled with `string`, while `Span` is a generic "fat pointer" that can be used with many container types. I updated the answer to take a crack at your request about limiting hash computations, hope it works for you. – parktomatomi Nov 04 '20 at 19:32
  • It follows the standard library conventions. Nothing much else for it. – Deduplicator Nov 04 '20 at 20:18
  • @Deduplicator heterogeneous lookup is an example of behavior that ought to be described in official, formal documentation rather than in "Tip Of The Week #144". – parktomatomi Nov 05 '20 at 09:31
  • @parktomatomi Not arguing with you there. The absolute least there should be is something like "heterogenous lookup works like in the standard library". BTW: If you extract a function to get a uniform view of the parts of a container that matter to equality and hash, you can more easily support other containers, and especially evade the combinatorical explosion in equality. – Deduplicator Nov 05 '20 at 11:09
  • @Deduplicator Oh, I didn't realize `is_transparent` came from the standard, learned something. Thanks for the tips! – parktomatomi Nov 05 '20 at 11:32