I cannot figure out what's wrong. I mean, the speed is way too fast, like 1 million items vs 10 million items basically have the same 0.0005 second computation on my machine. So fast, it looks like it wasn't doing anything. But the result of the data is actually correct.
It is mind boggling because if I make similar computation on sequential loop without storing the result in an array, it is not only number of cores slower, but, like 1000 times slower than ArrayFire.
So, maybe I wasn't using the timer correctly?
Do you think they didn't actually compute the data right away? Maybe it just sets up some kind of shadow marker? And when I call the myArray.host(), it will start doing all the actual computations?
From their website, it says there is some kind of JIT to bundle the computations.
ArrayFire uses Just In Time compilation to combine many light weight functions into a single kernel launch. This along with our easy-to-use API allows users to not only quickly prototype their algorithms, but also get the best out of the underlying hardware.
I start/stop my timer right before/after few ArrayFire computations. And it is just insanely fast. Maybe I test it wrong? What's the proper way to test ArrayFire performance?
Never mind, I found out what to do, Based on the examples, I should be using af::timeit(function) instead of using the af::timer. Using af::timeit will be very slow, but, the result scale more reasonably when I increase the size 10x. It doens't actually compute right away, that's why using af::timer myself wouldn't work.
thank you