0

I wanted to compare average runtime of two functions in MySQL -

Square distance: pow(x1 - x2, 2) + pow(y1 - y2, 2) + pow(z1 - z2, 2)

vs

Dot product: x1 * x2 + y1 * y2 + z1 * z2

Now, whichever function I choose is going to run around 50,000,000,000 times in a single query! So, even the tiniest of difference in their runtime matters.

So, I tried profiling. Here's what I got,

mysql> show profiles;
+----------+------------+-----------------------------------------------------------------------+
| Query_ID | Duration   | Query                                                                 |
+----------+------------+-----------------------------------------------------------------------+
|        4 | 0.00014400 | select pow(rand()-rand(),2)+pow(rand()-rand(),2)+pow(rand()-rand(),2) |
|        5 | 0.00012800 | select pow(rand()-rand(),2)+pow(rand()-rand(),2)+pow(rand()-rand(),2) |
|        6 | 0.00017000 | select pow(rand()-rand(),2)+pow(rand()-rand(),2)+pow(rand()-rand(),2) |
|        7 | 0.00024800 | select pow(rand()-rand(),2)+pow(rand()-rand(),2)+pow(rand()-rand(),2) |
|        8 | 0.00014400 | select pow(rand()-rand(),2)+pow(rand()-rand(),2)+pow(rand()-rand(),2) |
|        9 | 0.00014000 | select pow(rand()-rand(),2)+pow(rand()-rand(),2)+pow(rand()-rand(),2) |
|       10 | 0.00014900 | select pow(rand()-rand(),2)+pow(rand()-rand(),2)+pow(rand()-rand(),2) |
|       11 | 0.00015000 | select rand()*rand()+rand()*rand()+rand()*rand()                      |
|       12 | 0.00012000 | select rand()*rand()+rand()*rand()+rand()*rand()                      |
|       13 | 0.00015200 | select rand()*rand()+rand()*rand()+rand()*rand()                      |
|       14 | 0.00022500 | select rand()*rand()+rand()*rand()+rand()*rand()                      |
|       15 | 0.00012700 | select rand()*rand()+rand()*rand()+rand()*rand()                      |
|       16 | 0.00013200 | select rand()*rand()+rand()*rand()+rand()*rand()                      |
|       17 | 0.00013400 | select rand()*rand()+rand()*rand()+rand()*rand()                      |
|       18 | 0.00013800 | select rand()*rand()+rand()*rand()+rand()*rand()                      |
+----------+------------+-----------------------------------------------------------------------+
15 rows in set, 1 warning (0.00 sec)

This is not very helpful at all, runtimes fluctuate around so much that I have no clue which one is faster and by how much.

I need to run each of these functions like 10,000 times to get a nice and consistent average runtime. How do I accomplish this in MySQL?

(Note that rand() is called 6 times in both the functions so it's runtime doesn't really make a difference)

Edit:

Sure, I can create a temp table, it would be slightly inconvenient, fill it with random values, which again is not straight forward (see How do I populate a mysql table with many random numbers) and then proceed to comparing my functions.

I wanted to know If a better way existed in MySQL.

Community
  • 1
  • 1
Optimus
  • 2,716
  • 4
  • 29
  • 49
  • Maybe you can create a table A, with only one column named as c, and insert 50,000,000 records, then select rand()*rand()+rand()*rand()+rand()*rand() as c0, A.c from A; this select would calculate 50,00,000 times, maybe would help you. – coo Nov 08 '14 at 07:52
  • It's not inconvenient. Just with a short programe, insert with a loop. You need not insert by hand. – coo Nov 08 '14 at 09:40
  • You are not profiling your distance function, you are profiling rand ! And does it really make sense to compare expressions that compute different things ? –  Nov 08 '14 at 10:27
  • @YvesDaoust It does make sense to compare them, as for two points on earth (sphere) minimising square distance is equivalent to maximising the dot product, 2 points with 0 distance between them have maximum dot product R*R. So, I have a choice, in matter of how I can make the query... – Optimus Nov 08 '14 at 10:44
  • If you are working on a 2D manifold, isn't it possible to work with 2 coordinates per point instead of 3 ? –  Nov 08 '14 at 13:07
  • I have latitudes and longitudes of points, but calculating distances between them requires a lot of trigonometric functions, which would be very slow. Comparing euclidian distance between them doesn't work along the poles or where longitude is around 180 or -180. I could not think of any other way to reduce the number of co-ordinates while still avoiding these issues. Do tell me if you can think of some way... – Optimus Nov 09 '14 at 10:14

1 Answers1

0

In the best of the cases, the function pow detects that the exponent is the integer 2 and performs exponentiation with a single multiply. There is no reason it could beat a pure multiply.

  • If your application is that compute intensive, you'd better look for an algorithm that has a better asymptotic complexity. –  Nov 08 '14 at 10:34
  • I tried coo's suggestion and yes, the dot product is 1.4 times faster on an average... but, It is inconvenient to create a whole table and fill it with random numbers every time you want to time a different function... It seems theres no other way... – Optimus Nov 08 '14 at 10:53
  • Try to switch to an algorithm with lower asymptotic complexity. –  Nov 08 '14 at 13:10