I am trying to go through and understand some of VLFeat code to see how they generate the SIFT feature points. One thing that has me baffled early on is how they compute the number of octaves in their SIFT computation.
So according to the documentation, if one provides a negative value for the initial number of octaves, it will compute the maximum which is given by log2(min(width, height)). The code for the corresponding bit is:
if (noctaves < 0) {
noctaves = VL_MAX (floor (log2 (VL_MIN(width, height))) - o_min - 3, 1) ;
}
This code is in the function is in the vl_sift_new function. Here o_min is supposed to be the index of the first octave (I guess one does not need to start with the full resolution image). I am assuming this can be set to 0 in most use cases.
So, still I do not understand why they subtract 3 from this value. This seems very confusing. I am sure there is a good reason but I have not been able to figure it out.