Understanding OpenCV LBP implementation

Question

I need some help on LBP based face detection and that is why I am writing this.

I have the following questions related to face detection implemented on OpenCV:

In lbpCascade_frontal_face.xml (this is from opencv): what is internalNodes, leafValues, tree,features etc? I know they are used in the algorithm . But I do not understand the meaning of each one of them. For example, why we take a particular feature and not the other for a particular stage? how we are deciding which feature/ node to choose?
What is feature values in the LBP_frontal_face_classifier.xml? I know they are a vector of 4 integers. But how should I use this features? I thought stage 0 access to the first feature but access is not in this pattern. What is the access pattern to this features?
All the papers in literature give a high level overview only. Their descriptions mainly consist of LBP calculation from neighborhood pixels. But how this LBP values is used against those elements in the classifier?
How does integral image help in calculating LBP value of a pixel? I know how HAAR is used. I need to understand LBP.

I read some papers, articles. But none clearly describes how LBP based face detection works or the algorithm in details. If someone wants to develop a face detection program on his own,what are the steps he should follow- no document describes that.

Please help me on these if you could. I would be grateful.

score 15 · Accepted Answer · edited May 23 '17 at 12:01

15

I refer you to my own answer from the past which lightly touches on the topic, but didn't explain the XML cascade format.

Let's look at a fake, modified for clarity example of a cascade with only a single stage, and three features.

<!-- stage 0 -->
<_>
  <maxWeakCount>3</maxWeakCount>
  <stageThreshold>-0.75</stageThreshold>
  <weakClassifiers>
    <!-- tree 0 -->
    <_>
      <internalNodes>
        0 -1 3 -67130709 -21569 -1426120013 -1275125205 -21585
        -16385 587145899 -24005</internalNodes>
      <leafValues>
        -0.65 0.88</leafValues></_>
    <!-- tree 1 -->
    <_>
      <internalNodes>
        0 -1 0 -163512766 -769593758 -10027009 -262145 -514457854
        -193593353 -524289 -1</internalNodes>
      <leafValues>
        -0.77 0.72</leafValues></_>
    <!-- tree 2 -->
    <_>
      <internalNodes>
        0 -1 2 -363936790 -893203669 -1337948010 -136907894
        1088782736 -134217726 -741544961 -1590337</internalNodes>
      <leafValues>
        -0.71 0.68</leafValues></_></weakClassifiers></_>

Somewhat later....

<features>
  <_>
    <rect>
      0 0 3 5</rect></_>
  <_>
    <rect>
      0 0 4 2</rect></_>
  <_>
    <rect>
      0 0 6 3</rect></_>
  <_>
    <rect>
      0 1 4 3</rect></_>
  <_>
      <rect>
      0 1 3 3</rect></_>

...

Let us look first at the tags of a stage:

The maxWeakCount for a stage is the number of weak classifiers in the stage, what is called in the comments a  and what I call an LBP feature.
- In this example, the number of LBP features in stage 0 is 3.
The stageThreshold is what the weights of the features must add up to at least for the stage to pass.
- In this example the stage threshold is -0.75.

Turning to the tags describing an LBP feature:

The internalNodes are an array of 11 integers. The first two are meaningless for LBP cascades. The third is the index into the <features> table of <rect>s at the end of the XML file (A <rect> describes the geometry of the feature). The last 8 values are eight 32-bit values which together constitute the 256-bit LUT I spoke of in my earlier answer. This LUT is computed by the training process, which I don't fully understand myself.
- In this example, the first feature of the stage references rectangle 3, which is described by the four integers 0 1 4 3.
The leafValues are the two weights (pass/fail) associated with a feature. Depending on the bit selected from the internalNodes during feature evaluation, one of those two weights is added to a total. This total is compared to the stage's <stageThreshold>. Then, bool stagePassed = (sum >= stageThreshold - EPS);, where EPS is 1e-5, determines whether the stage has passed or failed. The weights are also determined by the training process.
- In this example the first feature's fail weight is -0.65 and the pass weight is 0.88.

Lastly, the <feature> tag. It consists of an array of <rect> tags which contain 4 integers describing the geometry of the feature. Given a processing window (24x24 in your case), the first two integers describe its x and y integer pixel offset within the processing window, and the next two integers describe the width and height of one subrectangle out of the 9 that are needed for the LBP feature to be evaluated.

In essence then, a tag <rect> ft.x ft.y ft.width ft.height </rect> situated within a processing window pW.widthxpW.height checking whether a face is present at pW.xxpW.y corresponds to...

To evaluate the LBP then, it suffices to read the integral image at points p[0..15] and use p[BR]+p[TL]-p[TR]-p[BL] to compute the integral of the nine subrectangles. The central subrectangle, R4, is compared that of the eight others, clockwise starting from R0, to produce an 8-bit LBP (the bits are packed [msb 01258763 lsb]).

This 8-bit LBP is then used as an index into the feature's (2^8 = 256)-bit LUT (the <internalNodes>), selecting a single bit. If this bit is 1, the feature is inconsistent with a face; if 0, it is consistent with a face. The appropriate weight (<leafNode>) is then returned and added with the weights of all other features to produce an overall stage sum. This is then compared to <stageThreshold> to determine whether the stage passed or failed.

If there's something else I didn't explain well enough I can clarify.

edited May 23 '17 at 12:01

Community

1
1

answered Mar 21 '14 at 17:31

Iwillnotexist Idonotexist

13,297
4
43
66

Thanks a LOT!!! Thanks for your very elaborate answer.that describes a lot of things. I have a few more questions. First, how the single bit is calculated from LUT? Second, as our xml file is generated based on 24x24 image, we can only detect faces that has size 24x24, right? That is why we scaled a image. Let's say we have a 40x40 image that consist one big face and a small face of 24x24 pixel at a corner. – Luniam Mar 24 '14 at 04:39
In that case, the LBP will be calculated as block by block and with block by block LBP calculation, the small image of 24x24 will never be detected. Then how can it detect that small face? I believe you could definitely clarify these. Please help. – Luniam Mar 24 '14 at 04:43
@user550762 The bit is extracted as follows. `uint32_t LUT[8]; int bit = (LUT[LBP >> 5] >> (LBP & 31))&1;` where LUT is the relevant table of eight integers. – Iwillnotexist Idonotexist Mar 24 '14 at 05:29
@user550762 The LBP cascade detects within a given scaled image only faces of 24x24, give or take 10% in size. It also reports the _top left corner coordinate_ of the face, _not its center_! Let's make a practical example: If you had an image of 64x64 with one 40x40 face at the bottom left corner and one 24x24 face at the top right corner, the LBP cascade _would_ detect one face at `(row=0, col=40)`, and would _miss_ the face at `(row=24, col=0)`. – Iwillnotexist Idonotexist Mar 24 '14 at 05:33
@user550762 Ah yes, and while I'm at it, the LBP bits are computed by the relation `int(Rx) >= int(R4)`. So the LBP is something around the lines of `LBP = ((int(R0) >= int(R4)) << 7) | ((int(R1) >= int(R4)) << 6) | ((int(R2) >= int(R4)) << 5) | ((int(R5) >= int(R4)) << 4) | ((int(R8) >= int(R4)) << 3) | ((int(R7) >= int(R4)) << 2) | ((int(R6) >= int(R4)) << 1) | ((int(R3) >= int(R4)) << 0);` – Iwillnotexist Idonotexist Mar 24 '14 at 05:45
Thank you very much. One more thing, if we have a big face and we can detect the face using block by block LBP calculation, why do we need to scale the image?That is extra overhead. – Luniam Mar 24 '14 at 15:27
@user550762 Because it's not in general possible to evaluate block integrals for any particular scale using an integral table computed for a single one. Think about it: If you had the integral for scale x1, how exactly would you use it to compute the block integrals of scale x1.21? On the other hand it is usually much faster to compute pyramids of images, esp. since things like GPU can do the scaling for you for nearly free. – Iwillnotexist Idonotexist Mar 24 '14 at 15:53
That's ok. My question was, why do we need a pyramid at all beacause I thought, for large image the do not work on pixels rather work on 24x24 block. But I found that for larger image also they work in a sub windows of 24x24 pixels. In your LUT indexing you used some formula. How do you define this formula? Why you are using 5 and not some other values? Moreover, what are the values of LUT mean? why we are taking a particular value and then taking a particular bit? how they are related to LBP code? – Luniam Mar 24 '14 at 21:47
Because it is impractical to scale the feature _size_ and image integral reads, one instead resizes images such that faces appear to be of size 24x24 at the interesting scales. For instance, if in a VGA (640x480) image, one is interested in detecting faces of size 72x72, one must downsample the VGA frame by a factor of (72/24) = 3x in order for the putative faces to be detectable by the trained cascade. The downsampled image size needs to be size ~213x160 (640/3 x 480/3) in order for faces of size 72x72 to appear to be of size 24x24 (72/3 x 72/3). – Iwillnotexist Idonotexist Mar 25 '14 at 01:50
The formula is a classic bit-index lookup. It splits the 8-bit LBP into a high 3-bit part and a low 5-bit part. The high 3-bit part can range from `0` to `2^3-1 == 7` and is used to select one of the eight 32-bit integers in the LUT we're interested in. The low 5-bit part can range from `0` to `2^5-1 == 31` and is used to select one bit out of the thirty-two in the above integer. Because individual bits are not addressable, the bit is extracted by shifting the integer right by the value of the 5-bit selector, leaving the chosen bit in bit-position 0, whence it is read by an `& 1`. – Iwillnotexist Idonotexist Mar 25 '14 at 01:59
@user550762 As for what the bits in the LUT mean. If by my bit-extract procedure you retrieve from the LUT the _LBP'th_ bit and find it to be `1`, then the feature is inconsistent with a face and is valued at its _negative_ weight. If you find the extracted bit to be `0`, then the feature is consistent with a face and is valued at its _positive_ weight. IOW, `LUT{LBP} == 1 ? fail : pass`. – Iwillnotexist Idonotexist Mar 25 '14 at 02:07
@ Iwillnotexist Idonotexist thanks again. So far what you did is great. I appreciate your help. About the last question of LUT bit meaning. I know what the algorithm is doing-- taking a particular bit and decides which weight to take. But my question is why we are considering this bits for deciding weight? What this 8*32=256 bits actually hold?I guess they are some combination of LBP-not sure. – Luniam Mar 26 '14 at 07:15
@user550762 Those bits are calculated by the training process, and indicate whether that particular feature is consistent with a face (`=0`) or not (`=1`). For instance, I would imagine that the LUT of a feature centered over an eye would have a `0` in bit 255. This is because when the LBP is evaluated, all surrounding regions of skin are lighter than the very dark eye pupil, which means that the LBP is `11111111` binary = `255` decimal, and because this local distribution of brightness is consistent with (a part of) a face, the bit at that offset in the LUT is 0. – Iwillnotexist Idonotexist Mar 26 '14 at 12:08
@ Iwillnotexist Idonotexist Thanks a lot.I really appreciate your help. I cannot thank you enough for the efforts you put to answer my questions. It is because of guys like you that others can understand things. Lots of people will be helped from your answers. Thanks again. – Luniam Mar 26 '14 at 17:38
@IwillnotexistIdonotexist Could you, please, elaborate on this line: `uint32_t LUT[8]; int bit = (LUT[LBP >> 5] >> (LBP & 31))&1;`? I saw that some numbers in the file can be negative. So we have to just blindly cast them to `uint8`? Thank you. – warmspringwinds Jun 03 '15 at 06:13
1

@warmspringwinds Your thrust is correct: The negative values just mean the MSB is set. I may have gotten away with using `int32_t`'s, but I decided to directly bitcast the array to `uint32_t`'s because I'm sure that shifts of 31 right on `uint32_t`'s are well-defined and will predictably give the correct result on all architectures. – Iwillnotexist Idonotexist Jun 03 '15 at 11:06
@IwillnotexistIdonotexist. I just wrote the whole script and used the `opencv` lbp trained file for frontal faces. I tried to test it on the images of faces form `vec` file of `OpenCV` on which I thought this file was trained, but all the faces doesn't go past the 3rd stage. Do you know on which faces I can test my implementation? because I am not sure if it is the problem with the scale or with my implementation. – warmspringwinds Jun 03 '15 at 15:21
@IwillnotexistIdonotexist am I right that to be detected, face has to pass all the cascade phases? – warmspringwinds Jun 03 '15 at 15:42
1

@warmspringwinds Yes, all stages must pass, and faces must be 24x24 in scale to be detectable. – Iwillnotexist Idonotexist Jun 03 '15 at 19:22
@IwillnotexistIdonotexist Thank you for your help! Sorry for bombing you with question, but I am struggling the whole day with it. Are you sure that when you say about `lut`: that `0` is consistent with a face and '1` is not? Or maybe you can link an working implementation to which I can compare my code? – warmspringwinds Jun 04 '15 at 12:45
@IwillnotexistIdonotexist Because in the source code of Opencv I see this thing `idx = (subset[c>>5] & (1 << (c & 31))) ? node.left : node.right;`. Or it's identical? – warmspringwinds Jun 04 '15 at 13:48
1

@warmspringwinds My descriptions are taken from my understanding of the OpenCV code. And you are indeed [at the right place](http://code.opencv.org/projects/opencv/repository/revisions/master/entry/modules/objdetect/src/cascadedetect.hpp#L545): Notice that `node.left` corresponds to the left (first) leaf value; It's the one that is _negative_ and also the one that is selected if the bit in the LUT was _set_. The conditional expression would then evaluate non-zero (true), and pick `node.left`. The right (second) leaf value is positive, and is selected when the bit in the LUT is _unset_. – Iwillnotexist Idonotexist Jun 04 '15 at 14:04
@IwillnotexistIdonotexist . Thank you. I did everything in `Python` but it doesn't work. Right now I am trying test it on images on different scales. I just took a face in a big bounding box and decrease the size of this box on each step and then resize to `24x24`. On some scales it goes to 17th stage but none of the images pass. Is it the right way to test it? – warmspringwinds Jun 04 '15 at 16:23
1

@warmspringwinds I've found the LBP face detector to be quite sensitive to face size and presentation. The head or face must be approximately 24x24 pixels in size (chin to top of head or hairline), should be tilted or rotated by no more than 10-15º out-of-plane. If you're getting too-early exits, try chopping off the last 3-5 stages (just not running them), and see if the remaining "survivors" are where you expect them. If they are, then it may be that the cascade described in XML is simply too selective for your needs. – Iwillnotexist Idonotexist Jun 04 '15 at 16:29
@IwillnotexistIdonotexist Thank you so much for this! It misses a lot of faces though. Even the faces that Opencv itself detected with lbp. – warmspringwinds Jun 04 '15 at 17:43
@IwillnotexistIdonotexist I have just wrote sliding window and scaling. And I guess the problem is when I am at the scale of the face in the image, the image itself is blurry on this scale and doesn't have enough contrast. Can it be the reason? – warmspringwinds Jun 04 '15 at 20:28
@warmspringwinds Maybe. What is your sliding step? For small scales the step by which you move should be 1 (every pixel). The strategy you use to resize the image is also a factor; If you linearly resize direct from a large image to a tiny one, you can amplify noise hugely. Instead you should `pyrDown` until you're less than 2x larger than the target size, and then linearly resize to the target size. – Iwillnotexist Idonotexist Jun 04 '15 at 21:31
@IwillnotexistIdonotexist My step is 1px. I also tried the thing that you recommended with gaussian pyramid but still the same. I guess it is the problem with my stage evaluation code. Can you have a look at it? It would be great if you could point out where the possible problems can be. The code is really short. If I make it work I think we can update your answer to help other people who might be stuck. https://gist.github.com/warmspringwinds/516772c7ddedf5b579b9 – warmspringwinds Jun 05 '15 at 07:46
@IwillnotexistIdonotexist I found a mistake :) Sorry for all this bothering. What do you think on updating your answer later? – warmspringwinds Jun 05 '15 at 08:23
1

@warmspringwinds If you make a tasteful edit I'll be glad to approve and get you the +2 rep! – Iwillnotexist Idonotexist Jun 05 '15 at 15:15
@IwillnotexistIdonotexist I have updated the answer. I hope you will find it useful. Now I have another problem: when I do the detection on multiple scales I get a good amount of false positives. I saw that `OpenCV` has a parameter `minNeighbors` in their detector. I guess they only leave the rectangles that have enough neighbours. Can you, please, elaborate on this a little bit if you are aware of it? – warmspringwinds Jun 28 '15 at 10:12
Just want to say thank you for this answer, it was incredibly helpful when trying to understand the lbp cascade file. I wanted to mention though, i came across a lot of people saying the first two values in a node don't mean anything in lbp, but they actually do. they are node.left and node.right, and are indexes into the leafValues. if there is only one node, then they are 0 and -1, but if you have more nodes, they would be the left and right indexes into the leafNodes. correct me if i'm wrong please, because that's how i'm reading the opencv code – iedoc Feb 24 '16 at 16:08
1

@iedoc Oh, you're welcome! About `node.left` and `node.right`: Yes, that's true. However, they only have meaning when you have tree-structured stages. If you have "stumps" (height-1 trees, in which failing a stage fails the cascade), as is the case in the default cascades provided by OpenCV like `lbpcascade_frontalface.xml`, then they really are meaningless. Tree-structured stages are more sophisticated by allowing a second chance for the stage to redeem itself. – Iwillnotexist Idonotexist Feb 24 '16 at 16:25
ah i see, so i guess opencv's lbp implementation doesn't use more than one node in each weak classifier so it makes sense why i keep seeing people say that the first two values don't really matter in the lbp implementation. i think i was wrong when i said that the first two values were indexes into the leafnodes, are they actually an index into the nodes then (if lbp were to use more nodes for branching)? – iedoc Feb 24 '16 at 19:52
@iedoc To be honest I'm not too sure myself. I've never seen them used in practice; I do suppose `-1` means "no child" and `i>=0` means "have child `i` down this branch". – Iwillnotexist Idonotexist Feb 25 '16 at 01:28
Thank you all for explaining this stuff. Somehow related: What does detectMultiscale basically ? My understanding is that it computes a number of scales (lets say 11) depending on scaleFactor and min/max size() parameters. On each of these scales it slides an e.g. 24x24 detector window pixel wise over the src image. Is that correct ? So for an 424x424 image we need to test (424-24)x(424-24) sub images and this on all 11 scales.Is that correct ? Are there any tricks how to lower the computational demands, like increasing strides or other ? – Chris Dec 19 '19 at 08:08
I have also asked a separate question on this topic here. Since this is slightly different, pls answer here: https://stackoverflow.com/questions/59404386 – Chris Dec 19 '19 at 08:17

Understanding OpenCV LBP implementation

1 Answers1

Linked