The official definition for presentationTimeUs in queueInputBuffer (int index, int offset, int size, long presentationTimeUs, int flags) is the following:
The presentation timestamp in microseconds for this buffer. This is normally the media time at which this buffer should be presented (rendered).
Why does the decoder need this if it is up to the app when to present the decoded image? I have tried some arbitrary numbers for presentationTimeUs, and they do not seem to make any difference to the decoding. For example, if I double the original values of presentationTimeUs, the video seems to be decoded exactly the same way and at the same speed as the original.
Could anyone shed some light on this?