2

I have a piece of MATLAB code which takes a 91x91 patch of pixels from an image and apples HOG to extract its feature vectors. I would like to rewrite the function in Python. I've been struggled for a while trying to find out how to get the same HOG return values in Python as it was in MATLAB but failed to do so. I will be really appreciate if you can provide any help.

The VLFeat library(http://www.vlfeat.org/overview/hog.html) is used in the MATLAB code and I am using scikit-image in Python(http://scikit-image.org/docs/dev/api/skimage.feature.html?highlight=peak_local_max#skimage.feature.hog).

In Matlab,the input 'im2single(patch)' is a 91*91 array, while the returned data type of Hog is 4*4*16 single.HoG is applied using a cell size of 23 and the number of orientation of 4.

     hog = vl_hog(im2single(patch),23, 'variant', 'dalaltriggs', 'numOrientations',4) ;

The returned data is 4*4*16 single, which can be displayed in the form of:

     val(:,:,1) =

     0         0         0         0
     0         0         0         0
     0    0.2000    0.2000    0.0083
     0    0.2000    0.2000    0.0317

     ....

     val(:,:,16) =

     0         0         0         0
     0         0         0         0
     0         0    0.0526    0.0142
     0         0    0.2000    0.2000

Then the result is flattened into a 256*1 feature vector manually. To sum up, in a 91*91 patch of pixels, a 256*1 feature vector is extracted. Now I want to get the same result in Python.

In my Python code, I tried to apply HOG with the same cell size and number of orientations.The block size is set to (1,1)

    tc = hog(repatch, orientations=4, pixels_per_cell=(23,23), cells_per_block= (1,1), visualise=False, normalise=False)

I appended the size of the patch to 92*92, so the patch size is the integer multiple of the cell size. The input array is now called 'repatch'. However, the output 'tc' is a 64*1 array(the gradient histograms is flattened to the feature vector)

   tc.shape 

   (64,)

Then I looked into the Skimage source code,

    orientation_histogram = np.zeros((n_cellsy, n_cellsx, orientations))
    orientation_histogram.shape 
    (4, 4, 4)

Here the n_cellsx is: number of cells in x and n_cellsy is: number of cells in y. It seems like the output of the Hog is highly related to the dimension of the orientation_histogram.

The actual dimension of the HoG returned valued is determined by:

    normalised_blocks = np.zeros((n_blocksy, n_blocksx,by, bx, orientations))

Where n_blocksy, n_blocksy are calculated by:

    n_blocksx = (n_cellsx - bx) + 1
    n_blocksy = (n_cellsy - by) + 1

n_cellsx is: number of cells in x,the value of which is 4 here, so is n_cellsy; bx,by is cells_per_block, which is (1,1); orientations is 4 in this case.

It seems like the size of returned value (normalised_blocks) is calculated by 4*4*1*1*4 (n_blocksy * n_blocksx * by * bx * orientations)

I've tried to change the block size but still cannot get what I was expected... (while the block size is (2,2) the returned value is a 144*1 array)

Can anyone please help... How can I get the same Hog output as in Matlab? Many thanks.

Peine
  • 101
  • 4
  • 12

1 Answers1

1

The VLFeat library does something different compared to scikit-image. The VLFeat library return 9 (number of orientation) contrast-insensitive, 18 contrast-sensitive and 4 dimensions which capture the overall gradient energy in square blocks (Contains of four cells). So it outputs 31 dimensions per cell. However the scikit-image procedure is different which I think you have a good understanding of it.

To my experience, if you want to find the same HoG vector using scikit-image and MATLAB you definitely at least should put cells_per_block= (2,2) for scikit-image.

zajonc
  • 1,935
  • 5
  • 20
  • 25
amirsina torfi
  • 171
  • 2
  • 5