For a tree layout that takes benefit of cache line prefetching (reading _next_ cacheline
is cheap), I need to solve the address calculation in a really fast way. I was able to boil down the problem to:
newIndex = nowIndex + 1 + (localChildIndex*X)
x would be for example: X = 45 + 44 + 43 + 42 +40.
Note: 4 is the branching factor. In reality it will be 16, so a power of 2. This should be useful to use bitwise stuff?
It would be very bad if it would need a loop to calculate X (performancewise
) and stuff like division/multiplication. This appeals to be an interesting problem which I wasn’t able to come up with some nice way of computing it.
Since its part of a tree traversal, 2 modes would be possible: absolute calculation, independent of prior calculations AND incremental calculation which starts with a high X being kept in a variable and then some minimal stuff done to it with every deeper level of the tree.
I hope I was able to make clear what the math should do. Not sure if there is a way to do this fast & without loop - but maybe somebody can come up with a really smart solution. I would like to thank everybody for their help - StackOverflow have been a great teacher to me in the past and I hope to be able to give back more in the future, as my knowledge increases.