I have a 10 second video (60fps) of a heart taken using m-mode ultrasound and I was hoping to train a prediction model on these with each video tagged with three custom labels. I am not interested in doing a classification on each individual frame as I could do that myself, rather classify the 10 second video based on each frame and changes (eg movement) between the series of ~600 frames.
Does either Clarifai Video V2 or Google's Video Intelligence offer training/prediction like this with custom labels?