In video processing, is the z-coordinate the frame number?

Question

I'm trying to perform some basic action recognition using the KTH dataset.

I'm using the 3DSIFT feature extractor from UCF link. Which extracts a SIFT descriptor from a given x, y and z coordinate.

For feature detection I am using selective-STIPS link, that has shown to be very effective for action recognition. According to the source code provided by the author, it produces the following output:

    @output : corner_points, P X 4 matrix, where P is the number of interest
%           point found in the image_stack and each interest point contains
%           4 values :: [X,Y] coordinate of the interest point, frame
%           number, scale at which it is detected.

Am I right to assume that the frame number provided here is also the Z-coordinate required by 3DSIFT?

I extracted STIPS from a video clip and got the required output but I am getting multiple X and Y values on every frame:

[71,24,1]
[54,26,1]
[86,29,1]
...
..
.

Is this expected output and accepted input for SIFT3D?

From what I can gather you are asking about 3rd party toolboxes or pieces of code without at a minimum linking to them. How is anyone suppose to know how these things work without seeing the code & knowing what version of something you are running — Aero Engy, Nov 08 '17 at 17:05
@AeroEngy I didnt feel the need to link as I felt this was a general question, not specifically related to any of the tool boxes but general video recognition. But I have linked to the scripts now, if that helps — StuckInPhDNoMore, Nov 08 '17 at 17:13

score 1 · Accepted Answer · answered Nov 08 '17 at 20:22

1

Yes, from what I can tell following through 3dsift Z is equivalent to frame number when dealing with video. So the x,y, frame output from stips should work as the x,y,z input to 3dsift.

answered Nov 08 '17 at 20:22

Aero Engy

3,588
1
16
27

In video processing, is the z-coordinate the frame number?

1 Answers1