My question is really simple but it is very confused to me :(
I am doing action recognition process using Matlab. I have 10 different actions, each action has 20 different videos, each video concludes descriptor with 500 points x 350 features (array 500x350).
When doing the vocab of words; suppose I took first 10 videos of each action as training set, then I will have 100 videos as training, 100(500x350).
My question, is how to consider the correspondence between training and test sets. I mean one of those:
- array of TRAIN is 100 cells, each cell is [500x350]double and array of TEST is [100x1] double.
- array of TRAIN is double, [50000x350]double and array of TEST is [50000x1] double. I mean I should treat each video as one instance, or each point [1x350] as instance !! I don't know what is the correct implementation.
Many thanks.