4

I'm trying to train a dog face detector with dlib's hog pyramid detector. I used Columbia dogs dataset: ftp://ftp.umiacs.umd.edu/pub/kanazawa/CU_Dogs.zip

At first I would get a recall of 0%, but by increasing C value I managed to increase it to 62% on training set and 53% on testing set. After certain point increasing C value stopped helping (1000+) and would only slow down training.

Precision is really high though, if it actually manages to find dog's face it's always correct, haven't seen any false positives.

Could you give any advice on how I could improve recall to a descent recall quality?

Thanks in advance

UPDATE: Following Davis King's advice, got the accuracy to 100% on training set and 80% on testing set just by training different detector per breed. I imagine it could be even higher if I cluster them by direction they're looking to.

grisevg
  • 250
  • 3
  • 18

1 Answers1

3

You probably need to train different detectors for different head poses and dogs that look very different. I would try running dlib's imglab command line tool with the --cluster option. That will cluster the images into coherent poses and you can train detectors for each pose.

Davis King
  • 4,731
  • 1
  • 25
  • 26
  • Correct me if I'm wrong, cause I'm new at this, but HOG is probably not a good tool for detecting dog faces, cause they don't have distinguished shape because of their fur? – grisevg Sep 08 '16 at 20:10
  • 1
    I doubt the fur is a problem. Pose variability and overall face structure variation between breeds certainly is though. – Davis King Sep 08 '16 at 20:57
  • Looks like you're right - "Detection of user-registered dog faces" paper uses DPM. Could you please give some additional advice on setting up bounding boxes - do you think it's better to have box around the whole head or a much smaller box just around nose and eyes (http://i.imgur.com/ABehiwd.png)? Also, because dog faces are generally long, do you think it would be better to do a more rectangular sliding window like 60x80, instead of default one of 80x80? – grisevg Sep 09 '16 at 10:43
  • Also the dataset actually has OBBs (rotated boxes), but dlib only works with AABBs. Is it better to rotate a picture (but then to have only a single box for pictures with multiple dogs) or just to rotate OBB into AABB? – grisevg Sep 09 '16 at 10:45
  • 1
    I would train multiple detectors based on the output of --cluster. Beyond that you should experiment to see what works best. – Davis King Sep 09 '16 at 12:29