12

I'm writing an app that records screen capture and audio using MediaCodec. I use MediaMuxer to mux video and audio to create mp4 file. I successfuly managed to write video and audio separately, however when I try muxing them together live, the result is unexpected. Either audio is played without video, or video is played right after audio. My guess is that I'm doing something wrong with timestamps, but I can't figure out what exactly. I already looked at those examples: https://github.com/OnlyInAmerica/HWEncoderExperiments/tree/audiotest/HWEncoderExperiments/src/main/java/net/openwatch/hwencoderexperiments and the ones on bigflake.com and was not able to find the answer.

Here's my media formats configurations:

    mVideoFormat = createMediaFormat();

    private static MediaFormat createVideoFormat() {
    MediaFormat format = MediaFormat.createVideoFormat(
            Preferences.MIME_TYPE, mScreenWidth, mScreenHeight);
    format.setInteger(MediaFormat.KEY_COLOR_FORMAT,
            MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface);
    format.setInteger(MediaFormat.KEY_BIT_RATE, Preferences.BIT_RATE);
    format.setInteger(MediaFormat.KEY_FRAME_RATE, Preferences.FRAME_RATE);
    format.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL,
            Preferences.IFRAME_INTERVAL);
    return format;
}
    mAudioFormat = createAudioFormat();

    private static MediaFormat createAudioFormat() {
    MediaFormat format = new MediaFormat();
    format.setString(MediaFormat.KEY_MIME, "audio/mp4a-latm");
    format.setInteger(MediaFormat.KEY_AAC_PROFILE, MediaCodecInfo.CodecProfileLevel.AACObjectLC);
    format.setInteger(MediaFormat.KEY_SAMPLE_RATE, 44100);
    format.setInteger(MediaFormat.KEY_CHANNEL_COUNT, 1);
    format.setInteger(MediaFormat.KEY_BIT_RATE, 64000);
    return format;
}

Audio and video encoders, muxer:

      mVideoEncoder = MediaCodec.createEncoderByType(Preferences.MIME_TYPE);
    mVideoEncoder.configure(mVideoFormat, null, null,
            MediaCodec.CONFIGURE_FLAG_ENCODE);
    mInputSurface = new InputSurface(mVideoEncoder.createInputSurface(),
            mSavedEglContext);
    mVideoEncoder.start();
    if (recordAudio){
        audioBufferSize = AudioRecord.getMinBufferSize(44100, AudioFormat.CHANNEL_CONFIGURATION_MONO, 
        AudioFormat.ENCODING_PCM_16BIT);
        mAudioRecorder = new AudioRecord(MediaRecorder.AudioSource.MIC, 44100,
        AudioFormat.CHANNEL_CONFIGURATION_MONO, AudioFormat.ENCODING_PCM_16BIT, audioBufferSize);
        mAudioRecorder.startRecording();

        mAudioEncoder = MediaCodec.createEncoderByType("audio/mp4a-latm");
        mAudioEncoder.configure(mAudioFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE);
        mAudioEncoder.start();
    }
    try {
        String fileId = String.valueOf(System.currentTimeMillis());
        mMuxer = new MediaMuxer(dir.getPath() + "/Video"
                + fileId + ".mp4",
                MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4);
    } catch (IOException ioe) {
        throw new RuntimeException("MediaMuxer creation failed", ioe);
    }
    mVideoTrackIndex = -1;
    mAudioTrackIndex = -1;
    mMuxerStarted = false;

I use this to set up video timestamps:

mInputSurface.setPresentationTime(mSurfaceTexture.getTimestamp());
drainVideoEncoder(false);

And this to set up audio time stamps:

lastQueuedPresentationTimeStampUs = getNextQueuedPresentationTimeStampUs();

if(endOfStream)
    mAudioEncoder.queueInputBuffer(inputBufferIndex, 0, audioBuffer.length, lastQueuedPresentationTimeStampUs, MediaCodec.BUFFER_FLAG_END_OF_STREAM);
 else
     mAudioEncoder.queueInputBuffer(inputBufferIndex, 0, audioBuffer.length, lastQueuedPresentationTimeStampUs, 0);


  mAudioBufferInfo.presentationTimeUs = getNextDeQueuedPresentationTimeStampUs();
  mMuxer.writeSampleData(mAudioTrackIndex, encodedData,
                           mAudioBufferInfo);
  lastDequeuedPresentationTimeStampUs = mAudioBufferInfo.presentationTimeUs;


  private static long getNextQueuedPresentationTimeStampUs(){
    long nextQueuedPresentationTimeStampUs = (lastQueuedPresentationTimeStampUs > lastDequeuedPresentationTimeStampUs) 
            ? (lastQueuedPresentationTimeStampUs + 1) : (lastDequeuedPresentationTimeStampUs + 1);
    Log.i(TAG, "nextQueuedPresentationTimeStampUs: " + nextQueuedPresentationTimeStampUs);
    return nextQueuedPresentationTimeStampUs;
}


private static long getNextDeQueuedPresentationTimeStampUs(){
    Log.i(TAG, "nextDequeuedPresentationTimeStampUs: " + (lastDequeuedPresentationTimeStampUs + 1));
    lastDequeuedPresentationTimeStampUs ++;
    return lastDequeuedPresentationTimeStampUs;
}

I took it from this example https://github.com/OnlyInAmerica/HWEncoderExperiments/blob/audiotest/HWEncoderExperiments/src/main/java/net/openwatch/hwencoderexperiments/AudioEncodingTest.java in order to avoid "timestampUs XXX < lastTimestampUs XXX" error

Can someone help me figure out the problem, please?

Alexey
  • 287
  • 1
  • 4
  • 9

2 Answers2

7

It looks like you're using system-provided time stamps for video, but a simple counter for audio. Unless somehow the video timestamp is being used to seed the audio every frame and it's just not shown above.

For audio and video to play in sync, you need to have the same presentation time stamp on audio and video frames that are expected to be presented at the same time.

See also this related question.

Community
  • 1
  • 1
fadden
  • 51,356
  • 5
  • 116
  • 166
  • Thanks a lot fadden, I was able to sync audio and video! you are awesome:) – Alexey Jan 14 '14 at 08:25
  • I have one more small issue though. My recording is played on playback faster with audio then without audio. And audio track seem to be distorted. I think this happens because audio is writen chunk by chunk with each video frame. Size of a chunk of audio to be written is counted depending on the fps and sample rate. Since my fps varies significantly, audio is writen by unequal chunks and hence the distortion. Maybe you have any suggestions on how to deal with it? – Alexey Jan 16 '14 at 06:00
  • Unfortunately I haven't done much with audio. Where are the audio timestamps coming from now? You have a good source for the video timestamps (the SurfaceTexture timestamp). You should be using something similar for audio, or maybe faking it with `System.nanoTime()`. (The media stuff generally uses the system monotonic clock, same as `System.nanoTime()`, although some parts of it use microseconds rather than nanoseconds.) – fadden Jan 16 '14 at 16:06
  • My audio and video timestamps are both coming from System.nanoTime(). I don't think timestamps are the problem now, the problem lays in varying audio chunks size. I'll try to find a way to drain audio encoder with the same intervals (not in onDrawFrame method, as it's done now). If you have any thoughts on this, you are highly welcome:) – Alexey Jan 17 '14 at 09:42
  • I'm not sure how best to approach this. You should probably write this up with full details as a new stackoverflow question. – fadden Jan 17 '14 at 15:45
  • @Alex Hi, have you solved the issue? I encounter the same problem as you! Plz help! – John Sep 08 '14 at 03:37
0

I think the solution might be to just repeatedly read audio samples. You could check if a new video frame is available every N audio samples, and pass it to the muxer with the same timestamp as soon as a new video frame arrives.

int __buffer_offset = 0;
final int CHUNK_SIZE = 100; /* record 100 samples each iteration */
while (!__new_video_frame_available) {
    this._audio_recorder.read(__recorded_data, __buffer_offset, CHUNK_SIZE);
    __buffer_offset += CHUNK_SIZE;
}

I think that should work.

Kindest regards, Wolfram