I am performing point addition on secp256k1 curve using Openssl library. To improve it's execution speed, i am checking for bottlenecks. Check out the below code.
#include <openssl/ec.h>
#include <openssl/obj_mac.h>
#include <openssl/bn.h>
int main()
{
EC_GROUP* group = EC_GROUP_new_by_curve_name(NID_secp256k1);
EC_POINT* point1 = EC_POINT_new(group);
EC_POINT* point2 = EC_POINT_new(group);
BIGNUM* x1 = BN_new();
BIGNUM* y1 = BN_new();
BIGNUM* x2 = BN_new();
BIGNUM* y2 = BN_new();
BN_hex2bn(&x1, "30f267a3b61884ad2dcfd1ede1f2c02447fb7ee497086f5b4614f1d1eacfabe7");
BN_hex2bn(&y1, "9c1ed1d84d6074a4b742c148437147adb2544b583302d1fdace5d911e8247980");
EC_POINT_set_affine_coordinates_GFp(group, point1, x1, y1, NULL);
BN_hex2bn(&x2, "79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798");
BN_hex2bn(&y2, "483ada7726a3c4655da4fbfc0e1108a8fd17b448a68554199c47d08ffb10d4b8");
EC_POINT_set_affine_coordinates_GFp(group, point2, x2, y2, NULL);
for(int x = 0; x < 0xffffff; x++)
{
EC_POINT_add(group, point1, point1, point2, NULL);
}
// Print point1 here
EC_POINT_free(point1);
EC_POINT_free(point2);
BN_free(x1);
BN_free(y1);
BN_free(x2);
BN_free(y2);
return 0;
}
The run time for this section of code is 13.5s. Now, the purpose of this exercise is to play around with point coordinates each time. So, i added EC_POINT_get_affine_coordinates_GFp(group, point1, x1, y1, NULL);
inside the loop.
for(int x = 0; x < 0xffffff; x++)
{
EC_POINT_add(group, point1, point1, point2, NULL);
EC_POINT_get_affine_coordinates_GFp(group, point1, x1, y1, NULL);
}
Now, the time taken for this is >200s. I don't understand how retrieving the point coordinates(which is already calculated) is the bottleneck instead of point addition which is complex enough(it involves mod inversions, additions, squares, etc). What am i not understanding here?
To anyone thinking that its just caching, I am printing point1 and also changing values constantly to prevent that.
The 13.5s is actually practically possible as my code for point addition using GMP library takes around 32s for 0xffffff iterations.