BOW using Matlab in action recognition

Question

My question is really simple but it is very confused to me :(

I am doing action recognition process using Matlab. I have 10 different actions, each action has 20 different videos, each video concludes descriptor with 500 points x 350 features (array 500x350).

When doing the vocab of words; suppose I took first 10 videos of each action as training set, then I will have 100 videos as training, 100(500x350).

My question, is how to consider the correspondence between training and test sets. I mean one of those:

array of TRAIN is 100 cells, each cell is [500x350]double and array of TEST is [100x1] double.
array of TRAIN is double, [50000x350]double and array of TEST is [50000x1] double. I mean I should treat each video as one instance, or each point [1x350] as instance !! I don't know what is the correct implementation.

Many thanks.

Could you clarify which learning algorithm and/or library for it that you intend to use? How best to represent your training data is likely to depend on that. — Neil Slater, Jan 27 '15 at 07:26
I am using SVM, either libsvm or vlfeat. Would you tell me please, how this is depending how to represent my data ? — Mo Farouk, Jan 27 '15 at 11:18
The learning functions usually expect the data to be arranged in a particular way before it can be used as parameters. For instance, for supervised learning, one method might want two matrices, first one with each row a training example input vector, second one with each row the output class. How to represent the output class may vary too. I don't know what `libsvm` expects, but now you have explained you are using it, it will help someone else to answer. — Neil Slater, Jan 27 '15 at 11:35
I am sure it is the same. First array TRAIN with each row a training example input vector, second array TEST with each row the output class. My question is represent each point as an example ? or each video (500p) as an example ? — Mo Farouk, Jan 27 '15 at 13:56
That depends on how you want to use the model at the end. If you will use the model by asking it to classify points, then use individual points (assuming it is even possible, I am not sure what "points" represent in your data). If you want to classify videos, then use the videos as training examples. — Neil Slater, Jan 30 '15 at 16:31

score 0 · Answer 1 · edited Sep 06 '16 at 03:56

0

10 action * 20 videos for each * (500*350) 

features = input for SVM for training

for the testing you have

1 action * 1 video * (500*350)

features in order to predict the action from trained SVM

edited Sep 06 '16 at 03:56

Sanoop Surendran

3,484
4
28
49

answered Sep 06 '16 at 03:12

Video Analysis Deep Learning

338
2
12

BOW using Matlab in action recognition

1 Answers1