How to subdivide integer hash into ranges

Question

I have unsigned 64bit number, representing mantissa, or fraction (which represent range from [0..1), where 0.0 maps to 0 and 0xffffff.. maps to a number "just before 1.0")

Now i want to split this range into equal buckets - and to answer - given random number key, to which part of the range it will fall to?

Its easier to get from following code:

func BucketIndex(key, buckets uint64) uint64 {
    return uint64(float64(key) / ((math.Pow(2, 64) / float64(buckets)))
}

My attempt to "hack this over" - was to split 2^64 to two, like if I will reduce range to 32bit, and operate in 64bit in order to conduct math:

// ~=key / ((1 << 64) / buckets)
return ((key >> 32) * buckets) >> 32

but ranges stopped to be equal.. eg one third (buckets==3) will be at 0x5555555600000000, instead of being at 0x5555555555555556 thats sad story, so im asking do you know of a better methods of finding (1 << 64) / buckets?

Use a bucket size of `max / buckets`, rounded up, and bucket index will be `key / bucketSize`. Doesn't this suffice you? — icza, Feb 06 '23 at 14:30
@icza that's my question, how can you find max (which is outside uint64 range) — xakepp35, Feb 06 '23 at 14:58
think of it `key / (max / buckets)` if you will do `key * buckets / max` - you will get 0 immediately, because its like shifting all of the bits of uint64 by 64 positions to the lsb, clearing out all of its bits out of uint64 storage... — xakepp35, Feb 06 '23 at 15:01

icza · Accepted Answer · 2023-02-06T21:44:17.040

3

If buckets is (compile-time) constant, you may use constant expression to calculate bucket size: constants are of arbitrary size. Else you may use big.Int to calculate it at runtime, and store the result (so you don't have to use big.Int calculations all the time).

Using a constant expression, at compile-time

To achieve an integer division rounding up, add divisor - 1 to the dividend:

const (
    max        = math.MaxUint64 + 1
    buckets    = 3
    bucketSize = uint64((max + buckets - 1) / buckets)
)

Using `big.Int`, at runtime

We can use the above same logic with big.Int too. An alternative would be to use Int.DivMod() (instead of adding buckets -1), and if mod is greater than zero, increment the result by 1.

func calcBucketSize(max, buckets *big.Int) uint64 {
    max = max.Add(max, buckets)
    max = max.Add(max, big.NewInt(-1))
    return max.Div(max, buckets).Uint64()
}

var bucketSize = calcBucketSize(new(big.Int).SetUint64(math.MaxUint64), big.NewInt(3))

edited Feb 06 '23 at 21:44

answered Feb 06 '23 at 15:22

icza

389,944
63
907
827

Interesting way, but I need at runtime, and scared that `big.NewInt` will require allocs and expensive conversions.. And I need fast! what do you think on working as with 2 64bit ints, doing `mult`, as in here? https://github.com/davidminor/uint128/blob/master/uint128.go#L72 – xakepp35 Feb 06 '23 at 21:06
@xakepp35 Yes, that'll likely be faster than `big.Int`. But note that if `buckets` has limited number of values, you can pre-calculate and cache the result bucket sizes, you don't have to calculate every time. Pre-calculating and caching will also outperform calculations with 2 64-bit integers too. – icza Feb 06 '23 at 21:10
1

If value of `buckets` is small, you could store the results in a slice too, and use `buckets` as the slice index! – icza Feb 06 '23 at 21:15
yeah, lut vs calc, depending on lot of factors. will check out how it's going.. – xakepp35 Feb 06 '23 at 21:16
Also note that storing and indexing a slice would also work if `buckets` is not small but could easily be transformed into a small number. E.g. let's say possible values of buckets are `100`, `200`, `300`. You could use a slice with 3 (or 4) elements to store the calculated bucket sizes, and index it with `buckets / 100`. – icza Feb 06 '23 at 21:24

How to subdivide integer hash into ranges

1 Answers1

Using a constant expression, at compile-time

Using big.Int, at runtime

Using `big.Int`, at runtime