Which architecture does dlib's CNN face detector use?

Question

I tried googling a lot but could not find it. Is it a implementation of some paper published on CNN face detection?

Is there any details on the theoretical part of dlib's convolutional face detector?

[`This face detector is made using the now classic Histogram of Oriented Gradients (HOG) feature combined with a linear classifier, an image pyramid, and sliding window detection scheme.`](http://dlib.net/face_detection_ex.cpp.html) — BugKiller, Aug 13 '18 at 01:47
Aren't the HOG features and CNN features different? I found the description you wrote in this file http://dlib.net/face_detector.py.html but I am wondering about the CNN specifically http://dlib.net/cnn_face_detector.py.html. Am I missing something? — , Aug 13 '18 at 01:54
my bad. [CNN version of MMOD](http://blog.dlib.net/2016/10/easily-create-high-quality-object.html) I think. http://dlib.net/cnn_face_detector.py.html, https://sourceforge.net/p/dclib/discussion/442518/thread/27ec33e7/ — BugKiller, Aug 13 '18 at 02:29

score 2 · Accepted Answer · answered Aug 13 '18 at 03:37

It uses a custom architecture. You can check it in the source code

    ...    

    template <template <int,template<typename>class,int,typename> class block, int N, template<typename>class BN, typename SUBNET>
    using residual = add_prev1<block<N,BN,1,tag1<SUBNET>>>;

    template <template <int,template<typename>class,int,typename> class block, int N, template<typename>class BN, typename SUBNET>
    using residual_down = add_prev2<avg_pool<2,2,2,2,skip1<tag2<block<N,BN,2,tag1<SUBNET>>>>>>;

    template <int N, template <typename> class BN, int stride, typename SUBNET> 
    using block  = BN<con<N,3,3,1,1,relu<BN<con<N,3,3,stride,stride,SUBNET>>>>>;

    template <int N, typename SUBNET> using ares      = relu<residual<block,N,affine,SUBNET>>;
    template <int N, typename SUBNET> using ares_down = relu<residual_down<block,N,affine,SUBNET>>;

    template <typename SUBNET> using alevel0 = ares_down<256,SUBNET>;
    template <typename SUBNET> using alevel1 = ares<256,ares<256,ares_down<256,SUBNET>>>;
    template <typename SUBNET> using alevel2 = ares<128,ares<128,ares_down<128,SUBNET>>>;
    template <typename SUBNET> using alevel3 = ares<64,ares<64,ares<64,ares_down<64,SUBNET>>>>;
    template <typename SUBNET> using alevel4 = ares<32,ares<32,ares<32,SUBNET>>>;

    using anet_type = loss_metric<fc_no_bias<128,avg_pool_everything<
                                alevel0<
                                alevel1<
                                alevel2<
                                alevel3<
                                alevel4<
                                max_pool<3,3,2,2,relu<affine<con<32,7,7,2,2,
                                input_rgb_image_sized<150>
                                >>>>>>>>>>>>;
    anet_type net;

    ...

That's face recognition. Face detector is located in the same folder cnn_face_detector.cpp . — apatsekin, Dec 14 '18 at 23:30
This network is for face recognition. It's different from face detection. — Yashas, Mar 02 '19 at 18:01

score 0 · Answer 2 · answered Apr 24 '23 at 12:48

0

"Max-Margin Object Detection" paper, link from "https://arxiv.org/pdf/1502.00046.pdf"

answered Apr 24 '23 at 12:48

Code

1
1

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Apr 26 '23 at 19:15

Which architecture does dlib's CNN face detector use?

2 Answers2