I am working on a novel object recognition algorithm with BoF implementation for my MSc project and it performs fairly well. I am using it on Caltech101 dataset. I want to compare the results with other techniques, like SIFT, SURF, PHOW and so on. The problem I'm facing is that none of the papers that list out their performance in terms of accuracy mention the number of categories taken into consideration.
E.g. for the Vl_feat PHOW, the script says in the comment section that it should give you about 64% accuracy. But when I run it without any changes, I'm getting 92%. When I changed it to test on all 102 categories, it's giving me an accuracy in the 30's range.
I am fairly new to this, so if I'm missing something that's obvious, please accept my apologies.