0

I am running Affectiva SDK 4.0 on a GoPro video recording. I'm using a C++ program on Ubuntu 16.04. The GoPro video was recorded with 60 fps. The problem is that Affectiva only provides results for around half of the frames (i.e. 30 fps). If I look at the timestamps provided by Affectiva, the last timestamp matches the video duration, that means Affectiva somehow skips around every second frame.

Before running Affectiva I was running ffmpeg with the following command to make sure that the video has a constant frame rate of 60 fps:

ffmpeg -i in.MP4 -vf -y -vcodec libx264 -preset medium -r 60 -map_metadata 0:g -strict -2 out.MP4 </dev/null 2>&1

When I inspect the presentation timestamp using ffprobe -show_entries frame=pict_type,pkt_pts_time -of csv -select_streams v in.MP4 I'm getting for the raw video the following values:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/media/GoPro_concat/GoPro_concat.MP4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.20.100
  Duration: 01:14:46.75, start: 0.000000, bitrate: 15123 kb/s
    Stream #0:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuvj420p(pc, bt709), 1280x720 [SAR 1:1 DAR 16:9], 14983 kb/s, 59.94 fps, 59.94 tbr, 60k tbn, 119.88 tbc (default)
    Metadata:
      handler_name    :  GoPro AVC
      timecode        : 13:17:26:44
    Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 127 kb/s (default)
    Metadata:
      handler_name    :  GoPro AAC
    Stream #0:2(eng): Data: none (tmcd / 0x64636D74)
    Metadata:
      handler_name    :  GoPro AVC
      timecode        : 13:17:26:44
Unsupported codec with id 0 for input stream 2
frame,0.000000,I
frame,0.016683,P
frame,0.033367,P
frame,0.050050,P
frame,0.066733,P
frame,0.083417,P
frame,0.100100,P
frame,0.116783,P
frame,0.133467,I
frame,0.150150,P
frame,0.166833,P
frame,0.183517,P
frame,0.200200,P
frame,0.216883,P
frame,0.233567,P
frame,0.250250,P
frame,0.266933,I
frame,0.283617,P
frame,0.300300,P
frame,0.316983,P
frame,0.333667,P
frame,0.350350,P
frame,0.367033,P
frame,0.383717,P
frame,0.400400,I
frame,0.417083,P
frame,0.433767,P
frame,0.450450,P
frame,0.467133,P
frame,0.483817,P
frame,0.500500,P
frame,0.517183,P
frame,0.533867,I
frame,0.550550,P
frame,0.567233,P
frame,0.583917,P
frame,0.600600,P
frame,0.617283,P
frame,0.633967,P
frame,0.650650,P
frame,0.667333,I
frame,0.684017,P
frame,0.700700,P
frame,0.717383,P
frame,0.734067,P
frame,0.750750,P
frame,0.767433,P
frame,0.784117,P
frame,0.800800,I
frame,0.817483,P
frame,0.834167,P
frame,0.850850,P
frame,0.867533,P
frame,0.884217,P
frame,0.900900,P
frame,0.917583,P
frame,0.934267,I
frame,0.950950,P
frame,0.967633,P
frame,0.984317,P
frame,1.001000,P
frame,1.017683,P
frame,1.034367,P
frame,1.051050,P
frame,1.067733,I
...

I have uploaded the full output on OneDrive.

If I run Affectiva on the raw video (not processed by ffmpeg) I face the same problem of dropped frames. I was using Affectiva with affdex::VideoDetector detector(60);

Is there a problem with the ffmpeg command or with Affectiva?

Edit: I think I have found out where the problem could be. It seems that Affectiva is not processing the whole video but just stops after a certain amount of processed frames without any error message. Below I have posted the C++ code I'm using. In the onProcessingFinished() method I'm printing something to the console when the processing is finished. But this message is never printed, so Affectiva never comes to the end.

Is there something wrong with my code or should I encode the videos into another format than MP4?

#include "VideoDetector.h"
#include "FrameDetector.h"

#include <iostream>
#include <fstream>
#include <mutex>
#include <condition_variable>

std::mutex m;
std::condition_variable conditional_variable;
bool processed = false;

class Listener : public affdex::ImageListener {
public:
    Listener(std::ofstream * fout) {
        this->fout = fout;
  }
  virtual void onImageCapture(affdex::Frame image){
      //std::cout << "called";
  }
  virtual void onImageResults(std::map<affdex::FaceId, affdex::Face> faces, affdex::Frame image){
      //std::cout << faces.size() << " faces detected:" << std::endl;

      for(auto& kv : faces){
        (*this->fout) << image.getTimestamp() << ",";
        (*this->fout) << kv.first << ",";
        (*this->fout) << kv.second.emotions.joy << ",";
        (*this->fout) << kv.second.emotions.fear << ",";
        (*this->fout) << kv.second.emotions.disgust << ",";
        (*this->fout) << kv.second.emotions.sadness << ",";
        (*this->fout) << kv.second.emotions.anger << ",";
        (*this->fout) << kv.second.emotions.surprise << ",";
        (*this->fout) << kv.second.emotions.contempt << ",";
        (*this->fout) << kv.second.emotions.valence << ",";
        (*this->fout) << kv.second.emotions.engagement << ",";
        (*this->fout) << kv.second.measurements.orientation.pitch << ",";
        (*this->fout) << kv.second.measurements.orientation.yaw << ",";
        (*this->fout) << kv.second.measurements.orientation.roll << ",";
        (*this->fout) << kv.second.faceQuality.brightness << std::endl;


        //std::cout <<  kv.second.emotions.fear << std::endl;
        //std::cout <<  kv.second.emotions.surprise  << std::endl;
        //std::cout <<  (int) kv.second.emojis.dominantEmoji;
      }
  }
private:
    std::ofstream * fout;
};

class ProcessListener : public affdex::ProcessStatusListener{
public:
    virtual void onProcessingException (affdex::AffdexException ex){
        std::cerr << "[Error] " << ex.getExceptionMessage();
    }
    virtual void onProcessingFinished (){
        {
            std::lock_guard<std::mutex> lk(m);
            processed = true;
            std::cout << "[Affectiva] Video processing finised." << std::endl;
        }
        conditional_variable.notify_one();
    }
};

int main(int argc, char ** argsv)
{
    affdex::VideoDetector detector(60, 1, affdex::FaceDetectorMode::SMALL_FACES);
    //affdex::VideoDetector detector(60, 1, affdex::FaceDetectorMode::LARGE_FACES);
    std::string classifierPath="/home/wrafael/affdex-sdk/data";
    detector.setClassifierPath(classifierPath);
    detector.setDetectAllEmotions(true);

    // Output
    std::ofstream fout(argsv[2]);
    fout << "timestamp" << ",";
    fout << "faceId" << ",";
    fout << "joy" << ",";
    fout << "fear" << ",";
    fout << "disgust" << ",";
    fout << "sadness" << ",";
    fout << "anger" << ",";
    fout << "surprise" << ",";
    fout << "contempt" << ",";
    fout << "valence" << ",";
    fout << "engagement"  << ",";
    fout << "pitch" << ",";
    fout << "yaw" << ",";
    fout << "roll" << ",";
    fout << "brightness" << std::endl;

    Listener l(&fout);
    ProcessListener pl;
    detector.setImageListener(&l);
    detector.setProcessStatusListener(&pl);

    detector.start();
    detector.process(argsv[1]);

    // wait for the worker
    {
    std::unique_lock<std::mutex> lk(m);
    conditional_variable.wait(lk, []{return processed;});
    }
    fout.flush();
    fout.close();
}

Edit 2: I have now digged further into the problem and looked only at one GoPro file with a duration of 19min 53s (GoPro splits the recordings). When I run Affectiva with affdex::VideoDetector detector(60, 1, affdex::FaceDetectorMode::SMALL_FACES); on that raw video the following file is produced. Affectiva stops after 906s without any error message and without printing "[Affectiva] Video processing finised".

When I now transform the video using ffmpeg -i raw.MP4 -y -vcodec libx264 -preset medium -r 60 -map_metadata 0:g -strict -2 out.MP4 and then run Affectiva with affdex::VideoDetector detector(60, 1, affdex::FaceDetectorMode::SMALL_FACES);, Affectiva runs until the end and prints "[Affectiva] Video processing finised" but the frame rate is only at 23 fps. Here is the file.

When I now run Affectiva with affdex::VideoDetector detector(62, 1, affdex::FaceDetectorMode::SMALL_FACES); on this transformed file, Affectiva stops after 509s and "[Affectiva] Video processing finised" is not printed. Here is the file.

machinery
  • 5,972
  • 12
  • 67
  • 118
  • To debug this situation we need to see the frame timestamps. The Affectiva SDK will discard frames where the inter-frame timestamp difference is less than the amount of time implied by the FPS rate passed to the VideoDetector constructor. So, if you pass 60, the minimum inter-frame timestamp difference is 1/60 or 0.01666 seconds (roughly... keeping in mind floating point precision limitations). – Andy Dennie Jun 17 '19 at 11:49
  • @AndyDennie I have update my post with the presentation timestamps. Do you need more pst? I can also dump a file if you want. – machinery Jun 17 '19 at 14:33
  • @AndyDennie I think I was able to partially track the problem, please see my edit in the post. Do you know a solution? – machinery Jun 18 '19 at 23:06
  • OK, that's helpful. I'm assuming that the onProcessingException callback is not getting invoked, or you would have mentioned it. Is there a pattern to the timestamps which are output by your onImageResults method, as compared to the source video timestamps (i.e. are any frames being skipped)? At what timestamp does processing stop? – Andy Dennie Jun 19 '19 at 11:44
  • @AndyDennie The onProcessingException is never invoked. I have update my post with the results of Affectiva in three different cases (please see Edit 2). Also at what timestamps it stops is varying from run to run. – machinery Jun 19 '19 at 16:05
  • @AndyDennie Could it be a problem with codecs or with some Ubuntu packages? Should I check the version of some packages? It is very strange that sometimes Affectiva runs over the whole video (but does only process half of the frames) and sometimes it just stops. – machinery Jun 19 '19 at 19:11
  • When you process with the constructor parameter of 60, it's skipping half the frames because of timestamp floating point rounding issues. When you specify 62, that works around those issues and it processes all of the frames. So that explains that aspect. However, I don't know why it's stopping at 509s in the latter case. – Andy Dennie Jun 19 '19 at 21:00
  • @AndyDennie Yes, your explanation makes sense. The only big problem is that it suddenly stops without throwing an exception. – machinery Jun 19 '19 at 21:05

1 Answers1

1

If the video frame rate is 60 use a number higher than 60 to process all frames. IIRC if you just use 61 or 62 you should get the correct number of frames.

Mr K.
  • 1,064
  • 3
  • 19
  • 22
  • I tried to use 61, 62, 80 and so on and that does not work. The amount of frames affectiva returns is way too less. The fps is around 60 (dividing the number of frames over the timestamps of Affectiva) but the amount of frames does not correspond to the video and also the timestamp Affectiva returns only goes until around 130s (for a 19min video). In terms of video duration I only get the correct result when I use 60 (but then the fps is only 30). – machinery Jun 17 '19 at 08:50
  • Normally you get that kind of problems when the video has a variable frame rate but as you made sure the video has a constant one I'm not sure what's going on. – Mr K. Jun 18 '19 at 06:24
  • I have edited my post. I think I found the problem. Do you know a solution? – machinery Jun 18 '19 at 23:07
  • I just saw the edit. It doesn't throw any exception? Btw, mp4 is just a container format but I have personally processed hundreds of mp4 videos using Affectiva and I only had problems with the ones encoded using variable frame rates. – Mr K. Jun 19 '19 at 07:44
  • I'm encoding it with libx264 using ffmpeg. Should I perhaps use another one? It does not throw any exception, it just terminates. I also wrapped everything in the main method in a try-catch block but there was no exception. – machinery Jun 19 '19 at 10:53
  • I have now updated my post with the results of Affectiva (please see Edit 2). – machinery Jun 19 '19 at 16:05
  • After your second edit and seeing the video length is not matching I guess the problem remains with the encoding. Change from libx264 to something else and instead of using -r to specify the frame rate try with fps=60 maybe. For more information check: https://trac.ffmpeg.org/wiki/ChangingFrameRate – Mr K. Jun 20 '19 at 05:12
  • I will give it a try. What encoding instead of libx264 would you use? – machinery Jun 20 '19 at 08:50
  • You can first try using libx264 and setting fps to 60 and if that doesn't work maybe try with the legacy mpeg4 encoder. – Mr K. Jun 20 '19 at 08:56