In my kdtree
project I just replaced a depth counter from being Int
-based to an explicit Key a
based on the type a
in KDTree v a
. This is the diff.
Now while I believe this should be a type-level change only my benchmarks show a sharp drop in performance:
Before:
benchmarking nr/kdtree_nr
mean: 60.19084 us, lb 59.87414 us, ub 60.57270 us, ci 0.950
std dev: 1.777527 us, lb 1.494657 us, ub 2.120168 us, ci 0.950
After:
benchmarking nr/kdtree_nr
mean: 556.9518 us, lb 554.0586 us, ub 560.6128 us, ci 0.950
std dev: 16.70620 us, lb 13.58185 us, ub 20.63450 us, ci 0.950
Before I dive into Core ... anyone has any idea what's going on here?
Edit 1
As proposed by Thomas (and userxyz) I replaced data Key a :: *
with type Key a :: *
and changed the implementation accordingly. This hasn't had any significent impact on the result:
benchmarking nr/kdtree_nr
mean: 538.2789 us, lb 537.5128 us, ub 539.4408 us, ci 0.950
std dev: 4.745118 us, lb 3.454081 us, ub 6.969091 us, ci 0.950
Edit 2
Just had a quick look at the Core output. Apparently the change prevents functions depending on the class to be specialized, right?
Before:
lvl20 :: KDTree Vector (V3 Double) -> [V3 Double]
lvl20 =
\ (w4 :: KDTree Vector (V3 Double)) ->
$wpointsAround $fKDCompareV3_$s$fKDCompareV3 lvl2 lvl4 nrRadius q w4
After:
lvl18 :: KDTree Vector (V3 Double) -> [V3 Double]
lvl18 =
\ (w4 :: KDTree Vector (V3 Double)) ->
$wpointsAround $dKDCompare lvl1 lvl3 nrRadius q w4
Small Update to Edit 2: Going crazy with INLINE pragmas doesn't change a thing here.
Edit 3
Quickly implemented what userxyz suggested: http://lpaste.net/104457 Been there before, can't make it to work:
src/Data/KDTree.hs:48:49:
Could not deduce (k ~ KeyV3)
from the context (Real a, Floating a)
bound by the instance declaration at src/Data/KDTree.hs:45:10-49
or from (Key k)
bound by the type signature for
dimDistance :: Key k => k -> V3 a -> V3 a -> Double
at src/Data/KDTree.hs:47:3-13
‘k’ is a rigid type variable bound by
the type signature for
dimDistance :: Key k => k -> V3 a -> V3 a -> Double
at src/Data/KDTree.hs:47:3
Relevant bindings include
k :: k (bound at src/Data/KDTree.hs:47:15)
dimDistance :: k -> V3 a -> V3 a -> Double
(bound at src/Data/KDTree.hs:47:3)
In the pattern: V3X
In a case alternative: V3X -> ax - bx
In the second argument of ‘($)’, namely
‘case k of {
V3X -> ax - bx
V3Y -> ay - by
V3Z -> az - bz }’
Edit 4
Hmm ... I think I just "solved" the problem by just throwing SPECIALIZE pragmas at the functions. This in effect causes everything to be inlined and removes the explicit dictionary passing.
I am not too happy with that solution as this means I have to put a big "please specialize your calls to achieve decent performance" warning in the docs.