1

I am working on people detection using two feature descriptor, HOG and LBP. So far, I combine both of the features using a simple concatenation. But it shows me sometimes problem due to big vectors. Here is my code.

%extract features from negative and positive images

[HOGpos,HOGneg] = features(pathPos, pathNeg);  


% loading and labeling each training example
HOG_featV = HOGfeature(fpos,fneg);   


% get label of training data from HOG
HOGlabel  = cell2mat(HOG_featV(2,:));

% get the feature vector value from HOG
HOGfeatureVector = HOG_featV(3,:)';

C = cell2mat(HOGfeatureVector); % each row of P correspond to a training example 




%extract features from LBP
[LBPpos,LBPneg] = LBPfeatures(pathPos, pathNeg);


% loading and labeling each training example
LBP_featV = loadingV(LBPpos, LBPneg); 


% get label of training data from LBP
LBPlabel = cell2mat(LBP_featV(2,:));

% get feature vector value from LBP
LBPfeatureVector = LBP_featV(3,:);
M = cell2mat(LBPtP)'; % each row of P correspond to a training example


%concatenate HOG and LBP feature
featureVector = [C M];

I want to know, is there any method to combine two feature vector which is more reliable and faster? If yes, please give some suggestion or link that I can refer. Thank you.

Indrasyach
  • 147
  • 3
  • 17
  • Hi Indrasyach, what do you need the `label = ...` line for; and could you give us an example how `HOGfeatureVector` and `LBPfeatureVector` are looking like. Why are they cells, and why don't you concatenate them with `[A,B]` or `cat(2,A,B)` as cells? – matheburg May 26 '14 at 08:43
  • What is the problem? Lack of memory? slow processing time? You can use the command pack() to discard old variables and free some memory. What is the length of your vectors? – DanielHsH May 26 '14 at 10:23
  • Hi @matheburg, i just edited my code above. Actually, HOGfeatureVector is a column vector whose size `<1845x1 cell>`, here each row has size `<1x324 double>` which contains feature vector from HOG descriptor for one image. C is a matrix of combining all feature from HOG. @matheburg – Indrasyach May 27 '14 at 02:12
  • Meanwhile for LBPfeatureVector is a row vector whose size `<1x1845 cell>`, which in each column has size `<59x1 double>` containing feature vector from LBP descriptor. M is a inverse matrix of combining all feature vectors from LBP. Afterwards, I concatenate both of features by using simple concatenation `featureVector = [C M]`. But it seems so troublesome for the classifier and sometime I got bad result on my detection results. Any idea for it? I am using SVM Light for the classifier. @matheburg – Indrasyach May 27 '14 at 02:15
  • Hi @DanielHsH, for HOG feature vector, C has size `<1845x324 double>` and for LBP feature vector, M has size `<1845x59 double>`. After i concatenated together, it has length, `featureVector <1845x383 double>`. But it gave me not good accuracy on detection. It seems like, simple concatenation gives some troublesome for a big vector. Any idea to combine 2 feature vector of HOG and LBP? thx – Indrasyach May 27 '14 at 02:21

1 Answers1

5

I understood from you comments the following: You are selecting 1845 key points in the image. For each point you calculate a feature vector of length 383 (LBP+HOG combined). The total vector that represents the image is of length ~100,000.

If in fact you have only 1845 images and each image is represented by only 383 features than you are doomed to fail and your SVM will have a very high error rate. Mainly because (the features vector is too short and the amount of training images is too small). So I assume that is not the case

You have a few problems in your approach.

  1. It seems you don't understand the difference between detector and classifier, and you are trying to apply classifier to solve detection problem. There is a core difference. Classifier is designed to distinguish between 2 (or more) classes. For example is this fruit and 'apple' or an 'orange'. You train it by giving a lot of examples of both classes. But if you give a Banana to the the classifier it will will return a random! result. It might say 'apple' with 100% confidence or an orange with 100% confidence. Classifier cannot deal with examples (like banana) that are not within the training classes (apple, orange). On the other hand detector doesn't have 2 classes. It has an 'apple' and 'everything else'. There are no examples outside of the training classes because 'apples' and 'everything else' cover the entire world. You can not train a classifier to recognize 'everything else' because there is an infinite variety of objects that are not 'apple'. 'Everything else' is not a class that can be trained to recognize by classifier. For that you need a detector. The training process of the detector is a bit different and the types of examples you supply are different. To train a detector you have to supply a huge amount of 'non apples. Images of cars, human beings, chairs, airplains. Typically you need many millions of different examples of 'non apples' and only a few examples of apples (say few thousands).
  2. SVM is not a good solution for detection problem. the core idea of the SVM is that you can select few representative examples (support vector) and describe the separation between 2 classes using those representatives. But let me ask you - what are the representative images of everything that is 'not an apple'? It is impossible to answer that question. The amount of objects that are not 'apples' is infinite so you cannot select good support vectors for them. It is easy to select representative to 'apple' (image of red apple, green apple, rotten apple, etc) but not to the 'everything else' class. Don't get me wrong - People use SVM for detection of easy objects (like corporate logos) but when your task is difficult like people detection (you have a huge intra-class variation) - SVM is inferior. Even if you manage to train an SVM for detection you will have 2 problems: Slow run time, high error rate on new examples (due to intra-class variation). I would suggest to use boosting methods to train a detector.
  3. I assume you get a high error rate because the amount of images you use for training is very low. Your feature vector is of length ~100,000 so if you use few millions of images for training - it will take few days for the SVM to get trained. Remember: for 2 class - classification problems few thousands of images might be enough. But, for detection problems you need millions of images.
  4. Your amount of features is too high (assuming it is 100,000). I am betting that you have less image samples than ~100,000 length of your features vector. For detection problems the amount of features should be relatively low (300-3000) but each feature must be good! For example you start with a list of 60,000 different features and during training process reduce the number to say 1,500 good feature and use only them for detection. If you start with lower amount of features (like 383 = LBP histogram + HOG histogram) than most of the features are useless and if you do PCA you might be surprised that effectively you have only ~40 features that help the detection and the rest are useless. On the contrary if you finish you training still requiring 100,000 features you will get crazy over-fitting wich will cause a high error rate of the SVM.

Practical advise:

  1. You can keep working with LBP/HOG features but first make sure you have a huge amount of 'non people' examples (1-10 millions) Otherwise you will have a high error rate. Try to get also few thousands images of people

  2. Train a single cascade detector. It is easier to understand, much faster and is more accurate than SVM. Give the training process all possible HOG, LBP features (without histograms) and let it select the good features. There are free implementations of cascade detector training inside openCV which you can use. My personal advise: do not use SVM for detection. It is a wrong tool. Once you detected a person in the image you can use SVM to classify if it is an adult or a kid but do not use SVM for detection.

DanielHsH
  • 4,287
  • 3
  • 30
  • 36
  • Hei daniel, thanks for your explanation. Here I have image for Positive samples : 1243 images and for Negative sample : 602. By using HOG, I got features for each image samples : 324 features for one image and by using LBP, I got 59 features for one images. So combined all positive and negative feature to be trained, from HOG : 1845 image samples and each sample got 324 features and also from LBP I got, 1845 image sample and each sample got 59 feature. @DanielHsH – Indrasyach Jun 04 '14 at 03:48
  • Then I try to combine the feature from HOG `<1845x324>` with feature from LBP `<1845x59>` by using simple concatenation. It gave me 383 features for one training image samples. @DanielHsH – Indrasyach Jun 04 '14 at 03:51
  • Here,, I used SVM to train the 1845 training image sample (positive and negative). For the detection part, I am using a sliding window approach to detect which features are similar from the training model. Afterwards, I used again SVM to classify the feature from the sliding window detection. @DanielHsH – Indrasyach Jun 04 '14 at 03:54
  • My question is about the way to combine the features obtained from HOG and LBP. Is there anyway to combine those features besides using simple concatenation? Thx @DanielHsH – Indrasyach Jun 04 '14 at 03:56
  • For SVM - concatenation is the best way to combine them. By concatenation you put them in different dimensions and let the SVM do its job – DanielHsH Jun 04 '14 at 10:31
  • yes I used simple concatenation, but it took time to train bunch of training images. Btw, do you have any idea how to use confidence map to combine two features? @DanielHsH Thank you – Indrasyach Jun 10 '14 at 09:10