2

In my scenario, two floating point numbers (say doubles) are considered equal if their absolute difference is within a certain range. This means it's easy to use doubles as keys for a std::map: I just need to define a custom comparison function. But I'm not sure what's a good approach for std::unordered_map, which needs a hash function in addition to a comparison function.

I don't think there is any method to produce the same hash value for two close doubles in general, as that would allow us to keep incrementing one of them in small steps, eventually obtaining two very different doubles with the same hash value.

Maybe there is a way to "normalize" floating point numbers to some "standard" values? For example, one very bad way to do it (that doesn't even always work) would be to use the hash of the nearest integer.

If std::unordered_map is just not the right choice, I'm curious what other choices I have for O(1) associative containers with floating point numbers as keys. (Note that the keys have to be floating point numbers, not custom high-precision decimal numbers, for performance reason.)

Zizheng Tai
  • 6,170
  • 28
  • 79
  • 1
    *This means it's easy to use doubles as keys for a std::map* -- You still have to rely on some sort of floating point calculation, which inherently is not exact. You just shifted the problem from one place to another. – PaulMcKenzie Nov 01 '20 at 05:51
  • @PaulMcKenzie I'm not sure I understand; it's OK to be inexact in my case. For example, when `m[x]` exists in the map `m`, I expect `m.at(y)` to return the value of `m[x]`, as long as `abs(x - y) < epsilon`. – Zizheng Tai Nov 01 '20 at 05:54
  • It shouldn't be possible right, since the way decimal numbers are stored in binary, it's impossible to have an exact value. I think the best way might be to just have a string of the digits. –  Nov 01 '20 at 05:55
  • `abs(x - y)` -- Run that code with a different compiler, different compiler options, etc. and compare the maps created. What if `x` and `y` are calculated slightly differently, depending on those conditions, given the same input? – PaulMcKenzie Nov 01 '20 at 05:58
  • @PaulMcKenzie It's OK if the result changes. My high-level goal is to just figure out if there is something "close enough" (which means ideally `< epsilon`, but a little marginal error is fine) to a given number. My problem is inherently inexact. – Zizheng Tai Nov 01 '20 at 06:01
  • Then I don't see an issue using floating point if it isn't a high priority to get the maps to be consistent across runs and using differing compiler and compiler options. – PaulMcKenzie Nov 01 '20 at 06:02
  • @PaulMcKenzie The problem is `unordered_map` doesn't consider two floating point numbers that are close enough as equal though, because they have different hash values. – Zizheng Tai Nov 01 '20 at 06:03
  • 1
    Have you considered using [std::nextafter](https://en.cppreference.com/w/cpp/numeric/math/nextafter) if there is a "tie" when inserting a new item? – PaulMcKenzie Nov 01 '20 at 06:04
  • 1
    Probing for nearby values generated with nextafter may be ok if your conceptual "epsilon" value is on that scale, but otherwise - if it would take thousands of nextafter calls to cover the range desired, then you'd be doing similar numbers of hash table lookups and it'd be slower than using a `std::map` and `lower_bound` or `upper_bound` to check for nearby values. Another approach is to do some rounding before you probe in the `unordered_map`, but it'll occasionally be largely arbitrary/random whether a double calculation yields a value that'll round up or down. May or may not matter. – Tony Delroy Nov 01 '20 at 06:28
  • @TonyDelroy How do `lower_bound` and `upper_bound` come into play here? – Zizheng Tai Nov 01 '20 at 06:46
  • 1
    Using doubles as keys for a `std::map` requires that you define a less than operator not an equality operator. Defining a less than operator that uses an epsilon does not meet strict weak ordering requirements. – john Nov 01 '20 at 06:55
  • 1
    Have you considered representing floats as fixpoints, assuming they're in the same order of magnitudes? – Kostas Nov 01 '20 at 06:57
  • 2
    The only approach I see is to devide the floating point range into 'buckets' by truncating some digits of the significand. This means that two numbers could be very close but fall into different buckets. Not ideal but I don't see any other way. – john Nov 01 '20 at 07:00
  • 3
    The inherent problem is that if you want a and b such that b = a + ε to have the same hash, then you would also want c = b + ε to have the same hash. Repeatedly using the same argument and you end up with the entire number range having the same hash. – john Nov 01 '20 at 07:04
  • @ZizhengTai: they let you find the entries in the map around the key value you're interested in, and in a map you can iterate so if there's not an exact match you can find whether there's a lower or higher value that's within epsilon (and if both, which is closer). Still, exactly how things group into buckets may be a bit arbitrary and order-of-insertion dependent. The bottom line is that occasionally two calculations that logically/mathematically produce the exact same value may end up as two separate keys, however you do this. – Tony Delroy Nov 01 '20 at 07:22
  • 2
    https://stackoverflow.com/questions/58758071/implementing-a-hash-table-like-data-structure-with-floating-point-keys-where-val – parktomatomi Nov 01 '20 at 08:19
  • @PaulMcKenzie: The inexactness of (many) floating-point operations isn’t really relevant here: integers “equal within 2” would have exactly the same issues, and it’s good to avoid feeding the misconception that floating-point arithmetic is somehow random or unknowable. – Davis Herring Nov 01 '20 at 19:38
  • @DavisHerring: I agree people harp on about floating point errors and inexactness too much, when there are times it matters and there are times it doesn't. Here it may or may not matter depending on the consequences of putting two nearly-identical values - that logically/mathematically should have been identical if there were no rounding errors - into different buckets. That won't happen for integers if you e.g. round odd numbers down. – Tony Delroy Nov 01 '20 at 21:22
  • @TonyDelroy: You can bin anything, but the edge cases always exist—rounding odd numbers down still puts **adjacent** values in different buckets. – Davis Herring Nov 01 '20 at 22:10
  • @DavisHerring: and what's wrong with that? It's not a comparable problem to that encountered where only the rounding errors in float point calculations alter bucket selection.... – Tony Delroy Nov 01 '20 at 23:26
  • @TonyDelroy: I think it’s equivalent in that it absolutely precludes a pure single-bucket strategy for any non-trivial symmetric non-transitive “equivalence” relation. Maybe the behavior of the integers is a bit more predictable (or predictable with less implementation information), but I don’t see that as the point of this question. – Davis Herring Nov 02 '20 at 00:25

0 Answers0